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What will you learn from this book? 

Wouldn't it be great if there were a statistics book that made 
histograms, probability distribiuions, and chi square analysis more 
enjoyable than going to the deiuist ? Hmd First Statistics brings this 
typically dry suInject to life, teaching you all the fundamentals of the 
discipline using interactive, real-world scenarios, ranging from 
analyzing sports stats to gambling to prescription drug testing* 

Whether you're taking a stats course% preparing for the AP Statistics 
exam, or just curious aboiu statistical analysis. Head Firsts brai n- 
friendly formula will help you not only master statistics, but show 
you how you can apply statistical principles in everyday life, 

Use probability ^ tKc 

at iKc S-taisviHc 





hypothesis 
"tcS'ts 3hd ust p— 

i© assess ihc validity 

medical dai 心 







Sample M ■吵切 

po?wlatio"_ 

r 

CXpci^-tioh 3nd 
vaHflhte b> {\^rt how 
_udh youVc v-eally 
wih at ihc slot mathihd 

Why does this book look so different? 

We think your time is too valuable to spend struggling with new 
concepts* Using the latest research in cognitive science and learning 
theory tn craft a multi-sensory learning experience, Head First 
Statistics uses a visually rich format designed for the way your brain 
works, not a text-heavy approach that puts vou to sleep. 


“Head First Statistics 
is hy far the most 
entertaining, attention- 
catcliliig study guide 
on the market. By 
presenting the material 
in an engaging manner, 
it provides students 
with a oomfortable way 
to learn an otherwise 
cumbersome subject. 
The explanation of the 
topics is presented in 
a manner comprehen¬ 
sible to students of all 
levels ■” 

—— Amina A ndenon, 
Teaching Felhm)/ 
PhD mndidatf 
in Statistics, UCIA 

“Head First is an intui¬ 
tive way to understand 
statistics using simple, 
real-life examples that 
make learning fun and 
natural. 17 

— Michael Pnm% 

compulaiional neuroscientht 
and slatisties instriuioi ； 
Boston Un iversity 
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Advance Praise for Head First Statistics 


“Head First Statistics is by far the most entertaining, attention-catching study guide on the market. By 
presenting the material in an engaging manner, it provides students with a comfortable way to learn an 
otherwise cumbersome subject. The explanation of the topics is presented in a manner comprehensible 
to students of all levels.” 

— Ariana Anderson, Teaching Fellow/PhD candidate in Statistics, UCLA 


“Head First Statistics is deceptively friendly. Breeze through the explanations and exercises and you just 
may find yourself raising the topic of normal vs. Poisson distribution in ordinary social conversation, 
which I can assure you is not advised!” 


— Gary Wolf，Contributing Editor, Wired Magazine 


“Dawn Griffiths has split some very complicated concepts into much smaller, less frightening, bits of 
stuff that real-life people will find very easy to digest. Lots of graphics and photos make the material 
very approachable, and I have developed quite a crush on the attractive lady model who is asking about 
gumballs on page 458.” 

— Bruce Frey, author of Statistics Hacks 


“Head First is an intuitive way to understand statistics using simple, real-life examples that make learning 
fun and natural.” 

— Michael Prerau 5 computational neuroscientist and statistics instructor, 
Boston University 


“Thought Head First was just for computer nerds? Try the brain-friendly way with statistics and you’ll 
change your mind. It really works.” 

— Andy Parker 


“This book is a great way for students to learn statistics — it is entertaining, comprehensive, and easy to 
understand. A perfect solution!” 

— Danielle Levitt 


“Down with dull statistics books! Even my cat liked this one.” 

— Cary Collett 



Praise for other Head First books 


“Kathy and Bert’s Head First Java transforms the printed page into the closest thing to a GUI you’ve ever 
seen. In a wry, hip manner, the authors make learningjava an engaging ‘what’re they gonna do next?’ 
experience.” 

— Warren KeufTel, Software Development Magazine 


“Beyond the engaging style that drags you forward from know-nothing into exalted Java warrior status, Head 
First Java covers a huge amount of practical matters that other texts leave as the dreaded “exercise for the 
reader...” It’s clever, wry, hip and practical — there aren’t a lot of textbooks that can make that claim and live 
up to it while also teaching you about object serialization and network launch protocols. ” 

一 Dr. Dan Russell, Director of User Sciences and Experience Research 
IBM Almaden Research Center (and teaches Artificial Intelligence at Stanford 
University) 

“It’s fast, irreverent, fun, and engaging. Be careful~you might actually learn something!” 

— Ken Arnold, former Senior Engineer at Sun Microsystems 
Co-author (with James Gosling, creator of Java), The Java Programming 
Language 


“I feel like a thousand pounds of books have just been lifted off of my head.” 

— Ward Cunningham, inventor of the Wiki and founder of the Hillside Group 


“Just the right tone for the geeked-out, casual-cool guru coder in all of us. The right reference for practi¬ 
cal development strategies —— gets my brain going without having to slog through a bunch of tired stale 
professor-speak.” 



“There are books you buy, books you keep, books you keep on your desk, and thanks to O’Reilly and the 
Head First crew, there is the penultimate category, Head First books. They’re the ones that are dog-eared, 
mangled, and carried everywhere. Head First SQL is at the top of my stack. Heck, even the PDF I have 
for review is tattered and torn.” 

一 Bill Sawyer, ATG Curriculum Manager, Oracle 


“This book’s admirable clarity, humor and substantial doses of clever make it the sort of book that helps 
even non-programmers think well about problem-solving.” 

— Cory Doctorow ， co-editor of Boing Boing 
Author, Dozvn and Out in the Magic Kingdom 
and Someone Comes to Town, Someone Leaves Town 



Praise for other Head First books 


“I received the book yesterday and started to read it...and I couldn’t stop. This is definitely tres ‘cool.’ It is 
fun, but they cover a lot of ground and they are right to the point. I’m really impressed.” 

— Erich Gamma, IBM Distinguished Engineer, and co-author of Design 
Patterns 


“One of the funniest and smartest books on software design I’ve ever read.” 

— Aaron LaBerge, VP Technology ， ESPN.com 


“What used to be a long trial and error learning process has now been reduced neatly into an engaging 
paperback.” 

— Mike Davidson ， CEO, Newsvine ， Inc. 


“Elegant design is at the core of every chapter here, each concept conveyed with equal doses of 
pragmatism and wit.’’ 

— Ken Goldstein，Executive Vice President, Disney Online 

“I ▼ Head First HTML with CSS & XHTML- -it teaches you everything you need to learn in a Tun 
coated’ format.” 

— Sally Applin，UI Designer and Artist 

“Usually when reading through a book or article on design patterns, I’d have to occasionally stick myself 
in the eye with something just to make sure I was paying attention. Not with this book. Odd as it may 
sound, this book makes learning about design patterns fun. 

u While other books on design patterns are saying 'Buehler... Buehler... Buehler... 5 this book is on the 
float belting out c Shake it up, baby!”’ 

— Eric Wuehler 


“I literally love this book. In fact, I kissed this book in front of my wife.” 


— Satish Kumar 
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what the population will be like and coming up with a way of saying how reliable your 
predictions are. In this chapter, well show you how knowing your sample helps you 
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You’ve seen how you can use point estimators to estimate the precise value of the 
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your estimate is completely accurate? After all, your assumptions about the population 
rely on just one sample, and what if your sample’s off? In this chapter, you’ll see another 
way of estimating population statistics, one that allows for uncertainty. Pick up your 
probability tables, and we’ll show you the ins and outs of confidence intervals. 
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Step 4: Find the confidence limits 
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Start by finding Z 
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Rewrite the inequality in terms of m 


498 


Finally, find the value of X 
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You’ve found the confidence interval 
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Look at the Evidence 

Not everything you’re told is absolutely certain. 

The trouble is, how do you know when what you’re being told isn’t right? Hypothesis 
tests give you a way of using samples to test whether or not statistical claims are likely 
to be true. They give you a way of weighing the evidence and testing whether extreme 
results can be explained by mere coincidence, or whether there are darker forces at 
work. Come with us on a ride through this chapter, and we’ll show you how you can use 
hypothesis tests to confirm or allay your deepest suspicions. 
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There’s Something Going On... 

Sometimes things don’t turn out quite the way you expect. 

When you model a situation using a particular probability distribution, you have a 
good idea of how things are likely to turn out long-term. But what happens if there are 
differences between what you expect and what you get? How can you tell whether 
your discrepancies come down to normal fluctuations, or whether they’re a sign of 
an underlying problem with your probability model instead? In this chapter, well 
show you how you can use the x 2 distribution to analyze your results and sniff out 
suspicious results. 


There may be trouble ahead at Fat Dan’s Casino 

Let’s start with the slot machines 

The yj test assesses difference 

So what does the test statistic represent? 

Two main uses of the % 2 distribution 
V represents degrees of freedom 
What’s the significance? 

Hypothesis testing with yj 

You’ve solved the slot machine mystery 

Fat Dan has another problem 

The yj distribution can test for independence 

You can find the expected frequencies using probability 

So what are the frequencies? 


We still need to calculate degrees of freedom 
Generalizing the degrees of freedom 
And the formula is... 

You’ve saved the casino 
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conelcttlgn and regression 

What’s My Line? 

Have you ever wondered how two things are connected? 

So far we’ve looked at statistics that tell you about just one variable — like men’s height, 
points scored by basketball players, or how long gumball flavor lasts — but there are other 
statistics that tell you about the connection between variables. Seeing how things are 
connected can give you a lot of information about the real world, information that you can 
use to your advantage. Stay with us while we show you the key to spotting connections: 
correlation and regression. 



Let’s analyze sunshine and attendance 

Exploring types of data 

Visualizing bivariate data 

Scatter diagrams show you patterns 

Correlation vs. causation 

Predict values with a line of best fit 

Your best guess is still a guess 

We need to minimize the errors 

Introducing the sum of squared errors 

Find the equation for the line of best fit 

Finding the slope for the line of best fit 

Finding the slope for the line of best fit, continued 

We’ve found b, but what about a? 

You’ve made the connection 
Let’s look at some correlations 

The correlation coefficient measures how well the line fits the data 
There’s a formula for calculating the correlation coefficient, r 
Find r for the concert data 
Find r for the concert data, continued 
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The Top Ten Things (we didn’t cover) 

ven after all that, there’s a bit more. There are just a few more 
hings we think you need to know. We wouldn’t feel right about ignoring them, 
ven though they only need a brief mention. So before you put the book down, 
take a read through these short but important statistics tidbits. 
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Looking Things up 

Where would you be without your trusty probability tables? 

Understanding your probability distributions isn’t quite enough. For some of them, you 
need to be able to look up your probabilities in standard probability tables. In this 
appendix you’ll find tables for the normal, t and X 2 distributions so you can look up 


probabilities to your heart’s content. 
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t-distribution critical values 
yj critical values 
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how to use this book 


Who is this book for? 


If you can answer “yes” to all of these: 

Do you need to understand statistics for a course, for 
your line of work, or just because you think it’s about 
time you learned what standard deviation means or 
how to find the probability of winning at roulette? 


② 


Do you want to learn, understand, and remember 
how to use probability and statistics to get the right 
results, every time? 


Do you prefer stimulating dinner party conversation to 
dry, dull, academic lectures? 


this book is for you. 


Who should probably back away from this book? 

If you can answer “yes” to any of these: 



② 


Are you someone who’s never studied basic algebra? 

(You don’t need to be advanced, but you should 
understand basic addition and subtraction, multiplication 
and division.) 

Are you a kick-butt statistician looking for a reference 
book? 


③ 


Are you afraid to try something different? Would you 
rather have a root canal than mix stripes with plaid? Do 
you believe that a statistics book can’t be serious if Venn 
diagrams are anthropomorphized? 


this book is not for you. 



^ -this book is 
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Wc know what youVe thinking 


“How can this be a serious book on statistics? 
“What’s with all the graphics?” 

“Can I actually learn it this wa 


5 ? 




Wc know what your brain is thinking 

Your brain craves novelty. It’s always searching, scanning, waiting for something 
unusual. It was built that way, and it helps you stay alive. 

So what does your brain do with all the routine, ordinary, normal things you 
encounter? Everything it can to stop them from interfering with the brain’s 
real job — recording things that matter. It doesn’t bother saving the boring 
things; they never make it past the “this is obviously not important” filter. 

How does your brain know what’s important? Suppose you’re out for a 
day hike and a tiger jumps in front of you, what happens inside your head 
and body? 

Neurons fire. Emotions crank up. Chemicals surge. 

And that’s how your brain knows... 

This must be important! Don’t forget it! 

But imagine you’re at home, or in a library. It’s a safe, warm, tiger-free zone. 
You’re studying. Getting ready for an exam. Or trying to learn some tough 
technical topic your boss thinks will take a week, ten days at the most. 

Just one problem. Your brain’s trying to do you a big favor. It’s trying 
to make sure that this obviously non-important content doesn’t clutter 
up scarce resources. Resources that are better spent storing the really 
big things. Like tigers. Like the danger of fire. Like how you should 
never have posted those “party” photos on your Facebook page. 

And there’s no simple way to tell your brain, “Hey brain, thank you 
very much, but no matter how dull this book is, and how little I’m 
registering on the emotional Richter scale right now, I really do want 
you to keep this stuff around.” 
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Metacognition: thinking about thinking 

If you really want to learn, and you want to learn more quickly and more 
deeply, pay attention to how you pay attention. Think about how you think. 
Learn how you learn. 

Most of us did not take courses on metacognition or learning theory when we 
were growing up. We were expected to learn, but rarely taught to learn. 


I wonder how 
I can trick my brain 
into remembering 
this stuff... 



The trick is to get your brain to see the new material you’re learning as 
Really Important. Crucial to your well-being. As important as a tiger. 
Otherwise, you’re in for a constant battle, with your brain doing its best to 
keep the new content from sticking. 


But we assume that if you’re holding this book, you really want to learn 
statistics. And you probably don’t want to spend a lot of time. If you want to 
use what you read in this book, you need to remember what you read. And for 
that, you’ve got to understand it. To get the most from this book, or any book 
or learning experience, take responsibility for your brain. Your brain on this 
content. 


So just how DO you get your brain to treat statistics 
like it was a hungry tiger? 


There’s the slow, tedious way, or the faster, more effective way. The 
slow way is about sheer repetition. You obviously know that you are able to learn 
and remember even the dullest of topics if you keep pounding the same thing into your 
brain. With enough repetition, your brain says, “This doesn’t feel important to him, but he 
keeps looking at the same thing over and over and over, so I suppose it must be.” 


The faster way is to do anything that increases brain activity, especially different 
types of brain activity. The things on the previous page are a big part of the solution, 
and they’re all things that have been proven to help your brain work in your favor. For 
example, studies show that putting words within the pictures they describe (as opposed to 
somewhere else in the page, like a caption or in the body text) causes your brain to try to 
makes sense of how the words and picture relate, and this causes more neurons to fire. 
More neurons firing = more chances for your brain to get that this is something worth 
paying attention to, and possibly recording. 


A conversational style helps because people tend to pay more attention when they 
perceive that they’re in a conversation, since they’re expected to follow along and hold up 
their end. The amazing thing is, your brain doesn’t necessarily care that the “conversation” 
is between you and a book! On the other hand, if the writing style is formal and dry, your 
brain perceives it the same way you experience being lectured to while sitting in a roomful 
of passive attendees. No need to stay awake. 


But pictures and conversational style are just the beginning... 


you are here ► 


XXXI 



how to use this book 


Here's what WE did: 

We used pictures, because your brain is tuned for visuals, not text. As far as your brain’s 
concerned, a picture really is worth a thousand words. And when text and pictures work 
together, we embedded the text in the pictures because your brain works more effectively 
when the text is within the thing the text refers to, as opposed to in a caption or buried in the 
text somewhere. 

We used redundancy, saying the same thing in different ways and with different media types, 
and multiple senses, to increase the chance that the content gets coded into more than one area 
of your brain. 



We used concepts and pictures in unexpected ways because your brain is tuned for novelty, 
and we used pictures and ideas with at least some emotional content, because your brain 
is tuned to pay attention to the biochemistry of emotions. That which causes you to feel 
something is more likely to be remembered, even if that feeling is nothing more than a little 

humor, surprise, oy interest. 

We used a personalized, conversational style, because your brain is tuned to pay more 
attention when it believes you’re in a conversation than if it thinks you’re passively listening 
to a presentation. Your brain does this even when you’re reading. 

We included more than 80 activities, because your brain is tuned to learn and remember 
more when you do things than when you read about things. And we made the exercises 
challenging-yet-do-able, because that’s what most people prefer. 

We used multiple learning styles, because might prefer step-by-step procedures, while 
someone else wants to understand the big picture first, and someone else just wants to see 
an example. But regardless of your own learning preference, everyone benefits from seeing the 
same content represented in multiple ways. 

We include content for both sides of your brain, because the more of your brain you 
engage, the more likely you are to learn and remember, and the longer you can stay focused. 
Since working one side of the brain often means giving the other side a chance to rest, you 
can be more productive at learning for a longer period of time. 



And we included stories and exercises that present more than one point of view, 
because your brain is tuned to learn more deeply when it’s forced to make evaluations and 
judgments. 


V?+aL S+a+lstfcs 


We included challenges, with exercises, and by asking questions that don’t always have 
a straight answer, because your brain is tuned to learn and remember when it has to work at 
something. Think about it — you can’t get your body in shape just by watching people at the 
gym. But we did our best to make sure that when you’re working hard, it’s on the right things. 
Thsityou y re not spending one extra dendrite processing a hard-to-understand example, 
or parsing difficult, jargon-laden, or overly terse text. 

We people. In stories, examples, pictures, etc., because, well, because j ⑽ Ye a person. 
And your brain pays more attention to people than it does to things. 
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Here's what YOU caw do to bend 
your brain into submission 

So, we did our part. The rest is up to you. These tips are a 
starting point; listen to your brain and figure out what works 
for you and what doesn’t. Try new things. 


Slow down. The more you understand, 
the less you have to memorize. 

Don’t just read. Stop and think. When the 
book asks you a question, don’t just skip to the 
answer. Imagine that someone really is asking 
the question. The more deeply you force your 
brain to think, the better chance you have of 
learning and remembering. 

( 2 ) Do the exercises. Write your own notes. 

We put them in, but if we did them for you, 
that would be like having someone else do 
your workouts for you. And don’t just look at 
the exercises. Use a pencil. There’s plenty of 
evidence that physical activity while learning 
can increase the learning. 

^3^ Read the “There are No Dumb Questions” 

That means all of them. They’re not optional 
sidebars — they } re part of the core content! 

Don’t skip them. 

^4^ Make this the last thing you read before 
bed. Or at least the last challenging thing. 

Part of the learning (especially the transfer to 
long-term memory) happens after yow put the 
book down. Your brain needs time on its own, to 
do more processing. If you put in something new 
during that processing time, some of what you 
just learned will be lost. 

( 5 ^ Drink water. Lots of it. 

Your brain works best in a nice bath of fluid. 
Dehydration (which can happen before you ever 
feel thirsty) decreases cognitive function. 


Talk about it. Out loud. 

Speaking activates a different part of the brain. 

If you’re trying to understand something, or 
increase your chance of remembering it later, say 
it out loud. Better still, try to explain it out loud 
to someone else. You’ll learn more quickly, and 
you might uncover ideas you hadn’t known were 
there when you were reading about it. 

^7^ Listen to your brain. 

Pay attention to whether your brain is getting 
overloaded. If you find yourself starting to skim 
the surface or forget what you just read, it’s time 
for a break. Once you go past a certain point, you 
won’t learn faster by trying to shove more in, and 
you might even hurt the process. 

Feel something. 

Your brain needs to know that this matters. Get 
involved with the stories. Make up your own 
captions for the photos. Groaning over a bad joke 
is still better than feeling nothing at all. 

^9^ Practice solving problems! 

There’s only one way to truly master statistics: 
practice answering questions. And that’s what 
you’re going to do throughout this book. Using 
statistics is a skill, and the only way to get good at it is 
to practice. We’re going to give you a lot of practice: 
every chapter has exercises that pose problems for 
you to solve. Don’t just skip over them — a lot of the 
learning happens when you solve the exercises. We 
included a solution to each exercise — don’t be afraid 
to peek at the solution if you get stuck! (It’s easy 
to get snagged on something small.) But try to solve 
the problem before you look at the solution. And 
definitely make sure you understand what’s going on 
before you move on to the next part of the book. 
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Read Me 


This is a learning experience, not a reference book. We deliberately stripped out everything 
that might get in the way of learning whatever it is we’re working on at that point in the 
book. And the first time through, you need to begin at the beginning, because the book 
makes assumptions about what you’ve already seen and learned. 


We begin by teaching basic ways of representing and summarizing 
data, then move on to probability distributions, and then more 
advanced techniques such as hypothesis testing. 

While later topics are important, the first thing you need to tackle is fundamental building 
blocks such as charting, averages, and measures of variability. So we begin by showing you 
basic statistical problems that you actually solve yourself. That way you can immediately 
do something with statistics, and you will begin to get excited about it. Then, a bit later in 
the book, we show you how to use probability and probability distributions. By then you’ll 
have a solid grasp of statistics fundamentals, and can focus on learning the concepts. After that, 
we show you how to apply your knowledge in more powerful ways, such as how to conduct 
hypothesis tests. We teach you what you need to know at the point you need to know it 
because that’s when it has the most value. 


We cover the same general set of topics that are on the AP and A 
Level curriculum. 

While we focus on the overall learning experience rather than exam preparation, we 
provide good coverage of the AP and A Level curriculum. This means that while you work 
your way through the topics, you’ll gain the deep understanding you need to get a good 
grade in whatever exam it is you’re taking. This is a far more effective way of learning 
statistics than learning formulae by rote, as you’ll feel confident about what you need when, 
and how to use it. 

We help you out with online resources. 

Our readers tell us that sometimes you need a bit of extra help, so we provide online 
resources, right at your fingertips. We give you an online forum where you can go to seek 
help, online papers, and other resources too. The starting point is 

http:/ / www. headfir stlabs • com/books / hfstats/ 
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The activities are NOT optional. 


The exercises and activities are not add-ons; they’re part of the core content of the book. 
Some of them are to help with memory, some are for understanding, and some will help 
you apply what you’ve learned. Don } t skip the exercises. The crossword puzzles are 
the only thing you don’t have to do, but they’re good for giving your brain a chance to 
think about the words and terms you’ve been learning in a different context. 


The redundancy is intentional and important. 

One distinct difference in a Head First book is that we want you to really get it. And we 
want you to finish the book remembering what you’ve learned. Most reference books 
don’t have retention and recall as a goal, but this book is about learnings so you’ll see some 
of the same concepts come up more than once. 


The Brain Power and Brain Barbell exercises don’t have 
answers. 


For some of them, there is no right answer, and for others, part of the learning 
experience of the activities is for you to decide if and when your answers are right. In 
some of the Brain Power and Brain Barbell exercises, you will find hints to point you in 
the right direction. 
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Sadly, having read this book, that turned out not to be the 
case. Andy spends most of his time now, worrying about 
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Michael J. Prerau is a researcher in Computational 
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how the neurons encode information in the brain. He 
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XXXVIII 


intro 


1 Visualising !n?otniciti9n 




參 First Impressions 餐 



Can’t tell your facts from your figures? 

Statistics help you make sense of confusing sets of data. They make the 
complex simple. And when you’ve found out what’s really going on, you 
need a way of visualizing it and telling everyone else. So if you want to 
pick the best chart for the job, grab your coat, pack your best slide rule, and 
join us on a ride to Statsville. 


this is a new chapter 




welcome to statsville! 


Statistics are everywhere 

Everywhere you look you can find statistics, whether you’re browsing the 
Internet, playing sports, or looking through the top scores of your favorite 
video game. But what actually is a statistic? 

Statistics are numbers that summarize raw facts and figures in some 
meaningful way. They present key ideas that may not be immediately 
apparent by just looking at the raw data, and by data, we mean facts or figures 
from which we can draw conclusions. As an example, you don’t have to 
wade through lots of football scores when all you want to know is the league 
position of your favorite team. You need a statistic to quickly give you the 
information you need. 

The study of statistics covers where statistics come from, how to calculate them, 
and how you can use them effectively. 

father data 







Analyze 


you've dhalyz^d 
youv* you ， 
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Praw conclusions 


2 Chapter 1 








visualizing information 


Put why learn statistics? 

Understanding what’s really going on with statistics empowers you. If you 
really get statistics, you’ll be able to make objective decisions, make accurate 
predictions that seem inspired, and convey the message you want in the 
most effective way possible. 

Statistics can be a convenient way of summarizing key truths about data, 
but there’s a dark side too. 



Statistics are based on facts, but even so, they can sometimes be misleading. 
They can be used to tell the truth — or to lie. The problem is how do you 
know when you’re being told the truth, and when you’re being told lies? 

Having a good understanding of statistics puts you in a strong position. 
You’re much better equipped to tell when statistics are inaccurate or 
misleading. In other words, studying statistics is a good way of making sure 
you don’t get fooled. 


As an example, take a look at the profits made by a company in the latter half 
of last year. 


Month 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

Profit (millions) 

2.0 

2.1 

2.2 

2.1 

2.3 

2.4 


The profit’s holding 
steady, but \Ys 
nothing special. 


Q 0 


How can there be two interpretations of the same 
set of data? Let’s take a closer look. 
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differences in data presentation 


Nov Dec 


N 


doi\so^ 栋仪 Aarts a 代 

based oy. same mWat«oy>, 

wt look w»iaiv 

V^at’s ^o'm^ oyv? 


Company Profit per Month 




No, this 
profit's amazing. 
Look at it soar! 


A tale of two charts 

So how can we explore these two different interpretations of the same data? 
What we need is some way of visualizing them. If you need to visualize 
information, there’s no better way than using a chart or graph. They can be a 
quick way of summarizing raw information and can help you get an impression 
of what’s going on at a glance. But you need to be careful because even the 
simplest chart can be used to subtly mislead and misdirect you. 

Here are two time graphs showing a companies profits for six months. They’re 
both based on the same information, so why do they look so different? They 
give drastically different versions of the same information. 



{SJel opM-o SUOH=E} luojd 
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visualizing information 



rperi your pencil 


Take a look at the two charts on the facing page. What would you 
say are the key differences? How do they give such different first 
impressions of the data? 


thereiare no o 

Dumb Questi9ns 


Q/ Why not just go on the data? Why chart it? 

Sometimes it’s difficult to see what’s really going on just by 
looking at the raw data. There can be patterns and trends in the data, 
but these can be very hard to spot if you’re just looking at a heap of 
numbers. Charts give you a way of literally seeing patterns in your 
data. They allow you to visualize your data and see what’s really 
going on in a quick glance. 


What’s the difference between information and data? 

Data refers to raw facts and figures that have been collected. 
Information is data that has some sort of added meaning. 

As an example, take the numbers 5, 6, and 7. By themselves, these 
are just numbers. You don’t know what they mean or represent. 
They’re data. If you’re then told that these are the ages of three 
children, you have information as the numbers are now meaningful. 
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sharpen your pencil solution 



Take a look at the two charts. What would you say are the key differences? 
How do they give such different first impressions of the data? 


Bo*th 匕 av-c based ov\ same da*ta, 

bu 七 {\\ty stY\d a diHev "⑶七 message. 
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s*ta\rt a*t O, d^d plo-ttmj *tiic pv-c^fi-t -fo\r 
moirrth aga’ms 七 *tlVis. 

Loo k the vcirii^l 
3XCS ^ d^ c , cy)i 
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Company Profit per Month 



o.o 



Jul Aug 


Sep Oct 

Month 


Nov Dec 


The sedo^d dha\rt jives d impvcssioh by 

vc\rtidal B%\s s*ta\rt 3 *t 3 di-r-fc\rc^*t fldde 
^y\A adjustmj sdale addov-dmgly. A*t a 3I 扣 dc, -the 

p\rc^fi*ts appear *to be \ris*mg d\ramaiidally eddh moirrtlv 
I 仏 only y/hc^ you look dloscv- you sec v/hats 
\really ^o'm^ ov\- 

The a^is -Pov *this dhav" 七 
s*tav*b a*t 2-.0, 灼。 七 0 
Mo v/o^dev -the fv-o^i 

looks so awesome- 


Company Profit per Month 


Why should I care about charts? 
Chart software can handle everything 
for you, that’s what ifs there for. 



Sep Oct 

Month 


Software can’t think for you. 

Chart software can save you a lot of time and produce effective 
charts, but you still need to understand what’s going on. 

At the end of the day, it’s your data, and it’s up to you to choose the 
right chart for the job and make sure your data is presented in the 
most effective way possible and conveys the message you want. 

Software can translate data into charts, but it’s up to you to make 
sure the chart is right. 
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visualizing information 


Manic Mango needs some charts 

One company that needs some charting expertise is Manic 
Mango, an innovative games company that is taking the world 
by storm. The GEO has been invited to deliver a keynote 
presentation at the next worldwide games expo. He needs some 
quick, slick ways of presenting data, and he’s asked you to come 
up with the goods. There’s a lot riding on this. If the keynote 
goes well, Manic Mango will get extra sponsorship revenue, and 
you’re bound to get a hefty bonus for your efforts. 

The first thing the CEO wants to be able to do is compare the 
percentage of satisfied players for each game genre. He’s started 
off by plugging the data he has through some charting software, 
and here are the results: 



Other 

Shooter 1,500 


3,500 


Action 

6,000 


Strategy 

11,500 



Sport 

27,500 


Units Sold per Genre 





Take a good look at the pie chart that the CEO has produced. What does 
each slice represent? What can you infer about the relative popularity of 
different video game genres? 
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anatomy of a pie chart 


The humble pie chart 

Pie charts work by splitting your data into distinct groups or categories. 
The chart consists of a circle split into wedge-shaped slices, and each slice 
represents a group. The size of each slice is proportional to how many are 
in each group compared with the others. The larger the slice, the greater 
the relative popularity of that group. The number in a particular group is 
called the frequency. 

Pie charts divide your entire data set into distinct groups. This means that 
if you add together the frequency of each slice, you should get 100%. 

Let’s take a closer look at our pie chart showing the number of units sold 
per genre: 


Other 

Shooter 1,500 

3,500 


Action 

6,000 


Strategy 

11,500 



Units Sold per Genre 


TK C s\\u \\crt is 州 uA smaller 

七 o*tKcv-s, so sales av-c 

a lo-t UcrLr 

^ IS is —cm ihdh all -the 
^ -that ihc 

,s -Pov- -this ^-tcgoiry. 

Sport 

27,500 

a r 仪山％ 


Genre 

Units sold 

Sports 

27,500 

Strategy 

11,500 

Action 

6,000 

Shooter 

3,500 

Other 

1,500 


So when are pie charts useful? 





We’ve seen that the size of each slice represents the relative 
frequency of each group of data you’re showingg. Because of 
this, pie charts can be useful if you want to compare basic proportions. 
It’s usually easy to tell at a glance which groups have a high frequency 
compared with the others. Pie charts are less useful if all the slices have 
similar sizes, as it’s difficult to pick up on subtle differences between 


V?+aL Statistics 

describes hoy/ 

i-terws -there are m a pa\rtidulav 
group or m*tcrval. Its like d 
dourrt of hoy/ rwar\y 3\rc. 


the slice sizes. 


So what about the pie chart that the Manic Mango CEO has created? 
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visualizing information 


Chart failure 


Creating a pie chart worked out so great for displaying the units sold per genre 
that the CEO’s decided to create another to chart consumer satisfaction with 
Manic Mango’s game. The GEO needs a chart that will allow him to compare 
the percentage of satisfied players for each game genre. He’s run the data 
through the charting software again, but this time he’s not as impressed. 


Other 

85% 


Sports 
99% 



CBO 


What happened here? All the 
slices are the same size, but the 
percentages are all different and 
are much larger than the slices. Can 
you help me fix this chart? Now? 


Shooter 

95% 


Pie charts are used to compare the 
proportions of different groups or 
categories, but in this case there’s little 
variation between each group. 

It’s difficult to take in at a glance which category has the 
highest level of player satisfaction. 

It’s also generally confusing to label pie charts with 
percentages that don’t relate to the overall proportion of 
the slice. As an example, the Sports slice is labelled 99%, 
but it only fills about 20% of the chart. Another problem 
is that we don’t know whether there’s an equal number 
of responses for each genre, so we don’t know whether 
it’s fair to compare genre satisfaction in this way. 



Strategy 

90% 


Action 

85% 

% Players Satisfied per Genre 


Pie ckarts 
skow 

proportions 





Take a look at the data, and think about the problems there are with this chart. 
What would be a better sort of chart for this kind of information? 
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two types of bar charts 


10000 - 


9000 - 

s^l c 4 ^ 8000 - 

sales g OCs “ 7ooo - 






bdv* 
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y 



Region 

Sales (units) 

A 

1,000 

B 

5,000 

C 

7,500 

D 

8,000 

E 

9,500 


A 


B C D E 

Sales per Region in Units 


Par charts caw allow for more accuracy 

A better way of showing this kind of data is with a bar chart. Just like pie 
charts, bar charts allow you to compare relative sizes, but the advantage 
of using a bar chart is that they allow for a greater degree of precision. 
They’re ideal in situations where categories are roughly the same size, as 
you can tell with far greater precision which category has the highest 
frequency. It makes it easier for you to see small differences. 

On a bar chart, each bar represents a particular category, and the length 
of the bar indicates the value. The longer the bar, the greater the value. All 
the bars have the same width, which makes it easier to compare them. 

Bar charts can be drawn either vertically or horizontally. 


Vertical bar charts 

Vertical bar charts show categories on the horizontal axis, and either 
frequency or percentage on the vertical axis. The height of each bar 
indicates the value of its category. Here’s an example showing the sales 
figures in units for five regions, A, B, G, D, and E: 


o o 
o 
o 
1 — 


(ssun) soles 


o o o o o 
o o o o o 
o o o o o 

6 5 4 3 2 
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visualizing information 


Horizontal bar charts 


Horizontal bar charts are just like vertical bar charts except that the axes 
are flipped round. With horizontal bar charts, you show the categories on 
the vertical axis and the frequency or percentage on the horizontal axis. 



Vertical bar charts tend to be more common, but horizontal bar charts 
are useful if the names of your categories are long. They give you lots of 
space for showing the name of each category without having to turn the 
bar labels sideways. 




O 


The vertical bar chart shows frequency, 
and the horizontal bar chart shows 
percentages. When should I use frequencies 
and when should I use percentages? 


It depends on what message you want to convey. 

Let’s take a closer look. 
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a look at scale 


Ifsa matter of scale 

Understanding scale allows you to create powerful bar charts that pick out the 
key facts you want to draw attention to. But be careful — scale can also conceal 
vital facts about your data. Let’s see how. 

Using percentage scales 

Let’s start by taking a deeper look at the bar chart showing player satisfaction 
per game genre. The horizontal axis shows player satisfaction as a percentage, 
the number of people out of every hundred who are satisfied with this genre. 


% Players Satisfied per Genre 



Percent Satisfied 


the pc^ch-bgcs. 


% Satisfied 


The purpose of this chart is to allow us to compare different percentages and 
also read off percentages from the chart. 

There’s just one problem — it doesn’t tell us how many players there are for 
each genre. This may not sound important, but it means that we have no idea 
whether this reflects the views of all players, some of them, or even just a 
handful. In other words, we don’t know how representative this is of players as a 
whole. The golden rule for designing charts that show percentages is to try and 
indicate the frequencies, either on the chart or just next to it. 



Be very wary if you’re given percentages with no frequencies, or a 
frequency with no percentage. 

Sometimes this is a tactic used to hide key facts about the underlying data, as just 
based on a chart, you have no way of telling how representative it is of the data. You 
may find that a large percentage of people prefer one particular game genre, but that 
only 10 people were questioned. Alternatively, you might find that 10,000 players like sports games 
most, but by itself, you can’t tell whether this is a high or low proportion of all game players. 


WateK it! 
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visualizing information 


Using frequency scales 

You can show frequencies on your scale instead of percentages. This 
makes it easy for people to see exactly what the frequencies are and 
compare values. 


Number of Players Satisfied per Genre 



Satisfied 






Number Satisfied 


Normally your scale should start at 0, but watch out! Not every chart does 
this, and as you saw earlier on page 6, using a scale that doesn’t start at 
0 can give a different first impression of your data. This is something to 
watch out for on other people’s charts, as it’s very easy to miss and can give 
you the wrong impression of the data. 


So are you telling 
me that I have to 
choose between showing 
frequency or percentages? 
What if I want both? 


There are ways of drawing bar charts that give 
you more flexibility. 

The problem with these bar charts is that they show either the 
number of satisfied players or the percentage, and they only show 
satisfied players. 

Let’s take a look at how we can get around this problem. 
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two data sets on one bar chart 


Pealing with multiple sets of data 

With bar charts, it’s actually really easy to show more than one set of data on 
the same chart. As an example, we can show both the percentage of satisfied 
players and the percentage of dissatisfied players on the same chart. 


The split-category bar chart 

One way of tackling this is to use one bar 
for the frequency of satisfied players and 
another for those dissatisfied, for each genre. 
This sort of chart is useful if you want to 
compare frequencies, but it’s difficult to 
see proportions and percentages. 


The segmented bar chart 

If you want to show frequencies and 
percentages, you can try using a segmented 
bar chart. For this, you use one bar for each 
category, but you split the bar proportionally. 
The overall length of the bar reflects the 
total frequency. 

This sort of chart allows you to quickly see 
the total frequency of each category — in 
this case, the total number of players for 
each genre — and the frequency of player 
satisfaction. You can see proportions at a 
glance, too. 


Player Satisfaction per Genre 



Frequency 


Player Satisfaction per Genre 



Frequency 


I I Satisfied 
Dissatisfied 


I I Satisfied 
■ Dissatisfied 
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visualizing information 


^arpe^ your pencil 


Sports 


Strategy 


Action 


Shooter 


Other 


Here’s another chart generated by the software. Which genre sold 
the most in 2007? How did this genre fare in 2006? 


Sales per Genre 



0 5000 


10000 15000 20000 25000 30000 


Sales 


The CEO needs another chart for the keynote presentation. Here’s the data; see if you 
can sketch the bar chart. 


Continent 

Sales 

(units) 

North America 

1,500 

South America 

500 

Europe 

1,500 

Asia 

2,000 

Oceania 

1,000 

Africa 

500 

Antarctica 

1 






oJuao 
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exercise solutions 


Action 


Shooter 


Other 




The Spo\rts sold mos*t ’m 2X)01. 
I 七 sold XI$00 uir\i*ts. 

I 灼 %00^> i*t or\Yj sold I^T,000, so i^o*t as 
n\3Y\y- ZOOi>, "the S*t\ra*tc5Y jc^\rc sold 
mo\rc uir\i*ts o*thc\r jc^\rc. 


0 


5000 10000 15000 20000 


25000 30000 

Sales 



EoteRciSe 

§OLytiOH 


The CEO needs another chart for the keynote presentation. Here’s the data, see if you 
can create the chart. 


Sales pev- Coir\*t*mc^*t 


Continent 

Sales 

(units) 

North America 

1,500 

South America 

500 

Europe 

1,500 

Asia 

2,000 

Oceania 

1,000 

Africa 

500 

Antarctica 

1 


-p 

Si 


Novtii 

£ou*t)i 

Euvopc 


Asia 

Odca^ia 

A-fv-ida 

A^*tav-d*tida 




Sales (uhits) 




o too \00 t >00 900 looo I zoo Woo \t>oo leoo zooo 

Sales 


(parpen your pencil 

Solution 


Here’s another chart generated by the software. Which genre sold 
the most in 2007? How did this genre fare in 2006? 

Sales per Genre 


Sports 


Strategy 



ajuoo 
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visualizing information 


Your bar charts rock 


The GEO is thrilled with the bar charts 
you’ve produced, but there’s more data 
he needs to present at the keynote. 




Nice work! Those charts are going to be a big hit at 
the expo. I’ve got another assignment for you. Weve 
been testing a new game with a group of volunteers, 
and we need a chart to show the breakdown of 
scores per game. Here's the data ： 



n ? j oil ^ 

0 f ^ VI the 、 

is tlrokch ih-to 
groups, example, 

P fc S 一 d bc ^CCh O 

ahC， 弓 OUaSlOYM. 


Score 

Frequency 

0-199 

5 

200-399 

29 

400-599 

56 

600-799 

17 

800-999 

3 



This data looks different from the other 
types of data weve seen so far. I wonder if 
that means we treat it differently? 



TV 心 c'ucM » s 仏 c 
Wes a 

sCort 

va 呼 … as 






Look back through the chapter. How do you think 
this type of data is different? What impact do you 
think this could have on charts? 
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categorical and numerical data 


Categories vs. numbers 

When you’re working with charts, one of the key things you need to figure out is what sort 
of data you’re dealing with. Once you’ve figured that out, you’ll find it easier to make key 
decisions about what chart you need to best represent your data. 


Categorical or qualitative data 

Most of the data we’ve seen so far is categorical. The 
data is split into categories that describe qualities or 
characteristics. For this reason, it’s also called qualitative 
data. An example of qualitative data is game genre; each 
genre forms a separate category. 


The key thing to remember with qualitative data is that 
the data values can’t be interpreted as numbers. 



treed oi ctogf 


type ol dessert 


Numerical or quantitative data 

Numerical data, on the other hand, deals with numbers. 
It’s data where the values have meaning as numbers, and 
that involves measurements or counts. Numerical data is 
also called quantitative data because it describes quantities. 




time 


So what impact does this have on the chart for Manic Mango? 
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0 


2 


3 


4 5 6 

No. Games 


Pealing with grouped data 

The latest set of data from the Manic Mango GEO is numeric and, 
what’s more, the scores are grouped into intervals. So what’s the best 
way of charting data like this? 


Thafs easy, doiVt we just 
use a bar chart like we did 
before? We can treat each 
group as a separate category. 


We could, but there’s a better way. 

Rather than treat each range of scores as a separate category, we 
can take advantage of the data being numeric, and present the data 
using a continuous numeric scale instead. This means that instead 
of using bars to represent a single item, we can use each bar to 
represent a range of scores. 

To do this, we can create a histogram. 

Histograms are like bar charts but with two key differences. The 
first is that the area of each bar is proportional to the frequency, and 
the second is that there are no gaps between the bars on the chart. 
Here’s an example of a histogram showing the average number of 
games bought per month by households in Statsville: 


The stoics av-c 
humc\ri(i ahd 
Stuped i h ^o 


Score 

Frequency 

0-199 

5 

200-399 

29 

400-599 

56 

600-799 

17 

800-999 

3 


o 



No. Games Bought per Month 
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building a histogram 


To make a histogram, start by finding bar widths 


The first step to creating a histogram is to look at each of the intervals and 
work out how wide each of them needs to be, and what range of values 
each one needs to cover. While doing this, we need to make sure that there 
will be no gaps between the bars on the histogram. 

Let’s start with the first two intervals, 0—199 and 200—399. At face value, 
the first interval finishes at score 199, and the second starts at score 200. 
The problem with plotting it like this, however, is that it would leave a gap 
between score 199 and 200, like this: 


Score 

Frequency 

0-199 

5 

200-399 

29 

400-599 

56 

600-799 

17 

800-999 

3 



199 




200 


Histograms shouldn’t have gaps between the bars, so to get around this, 
we extend their ranges slightly. Instead of one interval ending at score 
199 and the next starting at score 200, we make the two intervals meet 
at 199.5, like this: 




199.5 

Doing this forms a single boundary and makes sure that there are no gaps 
between the bars on the histogram. If we complete this for the rest of the 
intervals, we get the following boundaries: 



iff zoo - y )， 午 oo 一州 么 oo - 

-0.5 199.5 399.5 599.5 799.5 999.5 


Each interval covers 200 scores, and the width of each interval is 200. Each 
interval has the same width. 

As all the intervals have the same width, we create the histogram by drawing 
vertical bars for each range of scores, using the boundaries to form the start 
and end point of each bar. The height of each bar is equal to the frequency. 
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visualizing information 
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Here’s a reminder of the data for Manic Mango. 


Score 

Frequency 

0-199 

5 

200-399 

29 

400-599 

56 

600-799 

17 

800-999 

3 


See if you can use the class boundaries to create a histogram for this data. 
Remember, the frequency goes on the vertical axis. 
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Here’s a reminder of the data for Manic Mango. 


Score Frequency 


0-199 

5 

200-399 

29 

400-599 

56 

600-799 

17 

800-999 

3 


See if you can use the class boundaries to create a histogram for this data. 
Remember, the frequency goes on the vertical axis. 
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visualizing information 



So is a histogram basically for grouped numeric data? 

Yes it is. The advantage of a histogram is that because 
its numeric, you can use it to show the width of each interval as 
well as the frequency. 

Q/ What about if the intervals are different widths? Can 
you still use a histogram? 

Absolutely. It’s more common for the interval widths to be 
equal size, but with a histogram they don’t have to be. There 
are a couple more steps you need to go through to create a 
histogram with unequal sized intervals, but we’ll show you that 
very soon. 

Why shouldn’t histograms have gaps between the 

bars? 

There are at least two good reasons. The first is to show 
that there are no gaps in the values, and that every value is 
covered. The second is so that the width of the interval reflects 
the range of the values you’re covering. As an example, if we 
drew the interval 0-199 as extending from value 0 to value 199, 
the width on the chart would only be 199 - 0 = 199. 


So why do we make the bars meet midway between 
the two? 

The bars have to meet, and it’s usually at the midway 
point, but it all comes down to how you round your values. When 
you round values, you normally round them to the nearest whole 
number. This means that the range of values from -0.5 to 0.5 all 
round to 0, and so when we show 0 on a histogram, we show it 
using the range of values from -0.5 to 0.5. 

Are there any exceptions to this? 

Yes, age is one exception. If you have to represent the age 
range 18-19 on a histogram, you would normally represent this 
using an interval that goes from 18 to 20. The reason for this is 
that we typically classify someone as being 19, for example, up 
until their 20th birthday. In effect, we round ages down. 


BULLET POINTS - 

■ The frequency is a statistical way of saying how many 
items there are in a category. 

■ Pie charts are good for showing basic proportions. 

■ Bar charts give you more flexibility and precision. 

■ Numerical data deals with numbers and quantities; 
categorical data deals with words and qualities. 

■ Horizontal bar charts are used for categorical data, 

particularly where the category names are long. 

■ Vertical bar charts are used for numerical data, or 

categorical data if the category names are short. 


■ You can show multiple sets of data on a bar chart, 

and you have a choice of how to do this. You can 
compare frequencies by showing related bars side- 
by-side on a split-category bar chart. You can show 
proportions and total frequencies by stacking the bars 
on top of each other on a segmented bar chart. 

■ Bar chart scales can show either percentages or 
frequencies. 

■ Each chart comes in a number of different varieties. 
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a problem with unequal interval widths 


Manic Mango needs another chart 

The GEO is very pleased with the histogram you’ve created for him — so 
much so, that he wants you to create another histogram for him. This time, 
he wants a chart showing for how long Manic Mango players tend to play 
online games over a 24-hour period. Here’s the data: 




Hours 

Frequency 

0-1 

4,300 

1-3 

6,900 

3-5 

4,900 

5-10 

2,000 

10-24 

2,100 




Theres something funny 
about that data. Ifs 
grouped like last time, but 
the intervals aren’t all the 
same width. 


He’s right, the interval widths aren’t all equal. 

If you take a look at the intervals, you can see that they’re different widths. 
As an example, the 10—24 range covers far more hours than the 0—1 range. 

If we had access to the raw data, we could look at how we could construct 
equal width intervals, but unfortunately this is all the data we have. We 
need a way of drawing a histogram that makes allowances for the data 
having different widths. 

For histograms, the frequency is proportional to the 
area of each bar. How would you use this to create 
a histogram for this data? What do you need to be 
aware of? 
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visualizing information 



I think we can just create this in the same way 
we did before—it’s no big deal. We draw bars 
on a numeric scale; ifs just that this time the 
bars are different widths. 


Do you think she’s right? 

Here’s a sketch of the chart, using frequency on the vertical scale 
and drawing bar widths proportional to their interval size. Do you 
see any problems? 


Hours Spent Gaming per Day 



r 




\s 




so 




0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

Hours 

A histogram's bar area must be proportional to frequency 

The problem with this chart is that making the width of each bar 
reflect the width of each interval has made some of the bars look 
disproportionately large. Just glancing at the chart, you might be left 
with a misleading impression about how many hours per day people 
really play games for. As an example, the bar that takes up the largest 
area is the bar showing game play of 10-24 hours, even though most 
people don’t play for this long. 

As this is a histogram, we need to make the bar area proportional to the 
frequency it represents. As the bars have unequal widths, what should 
we do to the bar height? 


o o o o 
o o o o 
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adjusting bar area 


Make the area of histogram bars proportional to frequency 


Up until now, we’ve been able to use the height of each bar to 
represent the frequency of a particular number or category. 

This time around, we’re dealing with grouped numeric data where 
the interval widths are unequal. We can make the width of each bar 
reflect the width of each interval, but the trouble is that having bars 
of different widths affects the overall area of each bar. 

We need to make sure the area of each bar is proportional to its 
frequency. This means that if we adjust bar width, we also need to 
adjust bar height. That way, we can change the widths of the bars so 
that they reflect the width of the group, but we keep the size of each 
bar in line with its frequency. 

Let’s go through how to create this new histogram. 


For kistogframs, 
tke frequency is 
representect 
tar 


area 


Step 1: Fiwd the bar widths 

We find how wide our bars need to be by looking at the range of values 
they cover. In other words, we need to figure out how many full hours 
are covered by each group. 

Let’s take the 1—3 group. This group covers 2 full hours, 1—2 and 2—3. 
This means that the width of the bar needs to be 2, with boundaries of 
1 and 3. . ^ 

Jfuil V^ouvs so -tv^c 2>. 



100 200 300 


If we calculate the rest of the widths, we get: 


Hours 

Frequency 

Width 

0-1 

4,300 

1 

1-3 

6,900 

2 

3-5 

4,900 

2 

5-10 

2,000 

5 

10-24 

2,100 

14 


Now that we’ve figured out the bar widths, we can 
move onto working out the heights. 
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visualizing information 


Step Z: Fiwd the bar heights 


Now that we have the widths of all the groups, we can use these to find the 
heights the bars need to be. Remember, we need to adjust the bar heights 
so that the overall area of each bar is proportional to the group’s frequency. 


First of all, let’s take the area of each bar. We’ve said that frequency and 

area are equivalent. As we already know what the frequency of each group , i 

is, we know what the areas should be too: .. ^ A : v* •、今 Vvt 

Area of bar = Frequency of group 




SO 


Now each bar is basically just a rectangle, which means that the area of 
each bar is equal to the width multiplied by the height. As the area gives 
us the frequency, this means: 


Frequency = Width of bar x Height of bar 


We found the widths of the bars in the last step, which means that we 
can use these to find what height each bar should be. In other words, 


Height of bar = Frequency 


Width of bar 



The height of the bar is used to measure how concentrated the 
frequency is for a particular group. It’s a way of measuring how 
densely packed the frequency is, a way of saying how thick or thin 
on the ground the numbers are. The height of the bar is called the 

frequency density. 


r^|terpen your pencil 


What should the height of each bar be? Complete the table. 


Hours 

Frequency 

Width 

Height (Frequency Density) 

0-1 

4,300 

1 

4,300 + 1 = 4,300 

1-3 

6,900 

2 


3-5 

4,900 

2 


5-10 

2,000 

5 


10-24 

2,100 

14 
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draw the histogram 



What should the height of each bar be? Complete the table. 


Hours 

Frequency 

Width 

Height (Frequency Density) 

0-1 

4,300 

1 

4,300 + 1 = 4,300 

1-3 

6,900 

2 

^OO 4- Z- lA^o 

3-5 

4,900 

2 

^OO 4-1 - ZA^O 

5-10 

2,000 

5 

1,000 4- ^ - ^OO 

10-24 

2100 

14 

1,100 + 1 午二阳 


Step 3: Praw your chart 一 a histogram 

Now that we’ve worked out the widths and heights of each bar, we 
can draw the histogram. We draw it just like before, except that this 
time, we use frequency density for the vertical axis and not frequency. 

Here’s our revised histogram. 
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visualizing information 


Frequency density refers to the concentration of values in 
data. It’s related to frequency, but it’s not the same thing. 
Here’s an analogy to demonstrate the relationship between 
the two. 

Imagine you have a quantity of juice that you’ve poured 
into a glass like this: 


frequency Density Up Cl^se 



all youv juidc m ^ ss * 
I 七 domes uf *bo 七 iVis level. 



What if you then pour the same quantity of juice into a 
different sized glass, say one that’s wider? What happens to the 
level of the juice? This time the glass is wider, so the level 
the juice comes up to is lower. 

The level of the juice varies in line with the width of the 
glass; the wider the glass, the lower the level. The converse is 
true too; the narrower the glass, the higher the level of juice. 



TKc ^lass 
»s y/idcv, so 
level 
•咖’七 as 


So what does juice have to do with frequency density? 


Juice 5 Frequency 

Imagine that instead of pouring juice into glasses, you’re “pouring” 
frequency into the bars on your chart. Just as you know the width of the 
glass, you know what width your bars are. And just like the space the juice 
occupies in the glass (width x height) tells you the quantity of juice in the 
glass, the area of the bar on the graph is equivalent to its frequency. 

The frequency density is then equal to the height of the bar. Keeping 
with our analogy, it’s equivalent to the level your juice comes to in each 
glass. Just as a wider glass means the juice comes to a lower level, a wider 
bar means a lower frequency density. 
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bullet points and no dumb questions 


BULLET POINTS - 

■ Frequency density relates to how concentrated the 
frequencies are for grouped data. It’s calculated using 

Frequency density = Frequency 

Group width 

■ A histogram is a chart that specializes in grouped 
data. It looks like a bar chart, but the height of each bar 
equates to frequency density rather than frequency. 


■ When drawing histograms, the width of each bar is 
proportional to the width of its group. The bars are 
shown on a continuous numeric scale. 

■ In a histogram, the frequency of a group is given by the 
area of its bar. 

■ A histogram has no gaps between its bars. 


theretare no o 

Dumb Questi9ns 


Why do we use area to represent frequency when 
we’re graphing histograms? 

It’s a way of making sure the relative sizes of each group 
stay in proportion to the data, and stay honest. With grouped 
data, we need a visual way of expressing the width of each 
group and also its frequency. Changing the width of the bars is 
an intuitive way of reflecting the group range, but it has the side 
effect of making some of the bar sizes look disproportionate. 

Adjusting the bar height and using the area to represent 
frequency is a way around this. This way, no group is 
misrepresented by taking up too much or too little space. 

What’s frequency density again? 

Frequency density is a way of indicating how 
concentrated values are in a particular interval. It gives you a 
way of comparing different intervals that may be different widths. 
It makes the frequency proportional to the area of a bar, rather 
than height. 

To find the frequency density, take the frequency of an interval, 
and divide it by the width. 


If I have grouped numeric data, but all the intervals 
are the same width, can I use a normal bar chart? 

Using a histogram will better represent your data, as 
you’re still dealing with grouped data. You really want your 
frequency to be proportional to its area, not height. 

Do histograms have to show grouped data? Can 
you use them for individual numbers as well as groups of 
numbers? 

Yes, you can. The key thing to remember is to make sure 
there are no gaps between the bars and that you make each 
bar 1 wide. Normally you do this by positioning your number in 
the center of the bar. 

As an example, if you wanted to draw a bar representing the 
individual number 1, then you’d draw a bar ranging from 0.5 to 
1.5, with 1 in the center. 
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visualizing information 




Here’s a histogram representing the number of levels completed in each game of Cows Gone 
Wild. How many games have been played in total? Assume each level is a whole number. 

No. Levels Completed per Game 



Represents 10 games 
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exercise solution 


SoLytiOH 


Here’s a histogram representing the number of levels completed in each game of Cows Gone 
Wild. How many games have been played in total? Assume each level is a whole number. 


No. Levels Completed per Game 



Level 


IVc r\ttd *to -f md "total of played, v/hidh y/c r\ttd h> -f md •bo'tal -fvc^uci^dy. 

The -fco-tal -fvc^uc^dy is c«\ual *to a\rca o^- eadh bav- added •boyt^cv-. o*thc\r y/ov-ds, y/c multiply 
of eddh bav- by its -fv-c^uc^dy dc^si*ty *to yt -fvc^uc^dy, a 灼 d 灼 add v/holc lo*t up •bojethev-. 


Level 


Frc<\{ACr\6y Density 

F\rc<\ucir\dy 

0 

1 

10 

UlO - 10 

1 

1 

ZO 

UZO - 10 

Z 

1 

^0 

lySO - ^0 

Z 

1 

10 

UZO - 10 

午-弓 

Z 

10 

Z%IO - 10 


Total FVe'uc 灼 dy — 10 + SO + + SO + ZO 

二 I 午 O 
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visualizing information 


Histograms caw't do cvcrythlwg 

While histograms are an excellent way to display 
grouped numeric data, there are still some 
kinds of this data they’re not ideally suited for 
presenting — like running totals … 


rd really like to be able to see at a glance how many 
people play for less than a certain number of hours. 
Like, instead of seeing how many people play for 
between 3 and 5 hours, could we have a graph that 
shows how many people play for up to 5 hours? 


Let’s see if we can help the CEO out. Here’s the 
histogram we had before. 
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It’s tricky to see at a glance what the running totals are in this 
chart. In order to find the frequency of players playing for up to 
5 hours, we need to add different frequencies together. We need 
another sort of chart.. .but what? 







24 


Hours 


What sort of information do you think we should show on the chart? What sort 
of information should we plot? Write your answer below. 
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cumulative frequency graphs 


Introducing cumulative frequency 

The GEO needs some sort of chart that will show him the total 
frequency below a particular value: the cumulative frequency. 
By cumulative frequency, we basically mean a running total. 

What we need to come up with is some sort of graph that shows 
hours on the horizontal axis and cumulative frequency on the 
vertical axis. That way, the GEO will be able to take a value and 
read off the corresponding frequency up to that point. He’ll be able 
to find out how many people play for up to 5 hours, 6 hours, or 
whatever other number of hours he’s most interested in at the time. 

Before we can draw the chart, we need to know what exactly 
we need to plot on the chart. We need to calculate cumulative 
frequencies for each of the intervals that we have, and also work 
out the upper limit of each interval. 

Let’s start by looking at the data. 


V?+aL 


Cumulative 

The *bo*tal up -to 

value. I*t’s basically a 


So what are the cumulative frequencies? 


First off, let’s suppose the GEO needs to plot the cumulative frequency, or 
total frequency, of up to 1 hour. If we look at the data, we know that the 
frequency of the 0-1 group is 4300, and we can see that is the upper limit of 
the group. This means that the cumulative frequency of hours up to 1 is 4300. 


Hours 

Frequency 

0-1 

4,300 

1-3 

6,900 

3-5 

4,900 

5-10 

2,000 

10-24 

2,100 




Next, let’s look at the total frequency up to 3. We know what the frequencies 
are for the 0-1 and 1-3 groups, and 3 is again the upper limit. To find the 
total frequency of hours up to 3, we add together the frequency of the 0—1 
group and the 1-3 group. 


Gan you see a pattern? If we take the upper limit of each of the groups of 
hours, we can find the total frequency of hours up to that value by adding 
together the frequencies. Applying this to all the groups gives us 


Hours 

Frequency 

Upper limit 

Cumulative frequency 

0 

0 

0 

0 

0-1 

4,300 

1 

4,300 

1-3 

6,900 

3 

4,300+6,900 = 11,200 

3-5 

4,900 

5 

4,300+6,900+4,900 = 16,100 

5-10 

2,000 

10 

4,300+6,900+4,900+2,000 = 18,100 

10-24 

2,100 

24 

4,300+6,900+4,900+2,000+2,100 = 20,200 


此 added w as 
V^ovavs a 
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visualizing information 


Prawiwg the cumulative frequency graph 

Now that we have the upper limits and cumulative frequencies, we 
can plot them on a chart. Draw two axes, with the vertical one for the 
cumulative frequency and the horizontal one for the hours. Once you’ve 
done that, plot each of the upper limits against its cumulative frequency, 
and then join the points together with a line like this: 



Cumulative 
frequencies 
can never 
decrease. 

If your cumulative 
frequency decreases at 
any point, check your 
calculations. 


Running Total of Hours Played 



x/ou ? \ot tw.s ? st 


^hdrpen your pencil 


The CEO wants you to find the number of instances of people 
playing online for up to 4 hours. See if you can estimate this 
using the cumulative frequency diagram. 
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sharpen solution and no dumb questions 

—言 yojgl — 





The CEO wants you to find the number of instances of people 
playing online for less than 4 hours. See if you can estimate this 
using the cumulative frequency diagram. 

To do v/e •f ’md 午 o 竹 a>cis, -f'md 

■this value *tiic \\Y\t o-f \rcad o^f -the 

^ov-vcspo^d'mj dumula*tivc -fv-c^uci^dy oy\ vc\rtidal B%\s. 

This jives us By\ a^swev- o( app\ro>cima*tcly 13,7 弓 O. | 的 o*thc\r 
wo\rds, 3\rc appvo>cima*(：cly ms*ta^dcs o-f 

people playmj OY\[\Y\t -fo\r ui^dcv- hours. 


thereiare no o 

Dumb Questions 


What’s a cumulative frequency? 

The cumulative frequency of a value 
is the sum of the frequencies up to and 
including that value. It tells you the total 
frequency up to that point. 

As an example, suppose you have data 
telling you how old people are. The 
cumulative frequency for value 27 tells 
you how many people there are up to and 
including age 27. 

Are cumulative frequency graphs 
just for grouped data? 

Not at all; you can use them for 
any sort of numeric data. The key thing 
is whether you want to know the total 
frequency up to a particular value, or 
whether you’re more interested in the 
frequencies of particular values instead. 


On some charts you can show 
more than one set of data on the same 
chart. What about for cumulative 
frequency graphs? 

You can do this for cumulative 
frequency graphs by drawing a separate 
line for each set of data. If, say, you wanted 
to compare the cumulative frequencies by 
gender, you could draw one line showing 
males and the other females. It would be 
far more effective to show both lines on one 
chart, as it makes it easier to compare the 
two sets of data. 

Is there a limit to how many lines 
you can show on one chart? 

There’s no specific limit, as it all 
depends on your data. Don’t have so many 
lines that the graph becomes cluttered 
and you can no longer use it to read off 
cumulative frequencies and compare sets 
of data. 


Remind me, how do I find the 
cumulative frequency of a value? 

You can find the cumulative frequency 
by reading it straight off the graph. You 
locate the value you want to find the 
cumulative frequency for on the horizontal 
axis, find where this meets the cumulative 
frequency curve, and then read the value of 
cumulative frequency off the vertical axis. 

If I already know the cumulative 
frequency, can I use the graph to find the 
corresponding value? 

Yes you can. Look for the cumulative 
frequency on the vertical axis, find where it 
meets the cumulative frequency curve, and 
then read off the value. 
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visualizing information 



E%eftci$e 


During the Manic Mango keynote, the CEO wants to explain how he wants to target particular age 
groups. He has a cumulative frequency graph showing the cumulative frequency of the ages, but 
he needs the frequencies too, and the dog ate the piece of paper they were written on. See if you 
can use the cumulative frequency graph to estimate what the frequencies of each group are. 


& u^r I 一卞 IS bc^usc sor^co^ is classed 3s bci^a /7 
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exercise solution 



During the Manic Mango keynote, the CEO wants to explain how he wants to target particular age 
groups. He has a cumulative frequency graph showing the cumulative frequency of the ages, but 
he needs the frequencies too, and the dog ate the piece of paper they were written on. See if you 
can use the cumulative frequency graph to piece together what the frequencies of each group are. 


Age group 

Upper limit 

Cumulative frequency 

Frequency 

<0 

0 

0 

0 
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visualizing information 


Here are two possible charts that the CEO could use in his keynote. Your 
task is to annotate each one, and say what you think the strengths and 
weaknesses are of each one relative to the other. Which would you pick? 


Profit in dollars 




2003 2004 2005 2006 2007 

Year 


Profit in dollars 


uT 
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c 

(/) 
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t 

o 
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Manic Mango 
Competitor 


Year 


Choosing the right chart 

The GEO is really happy with your work on cumulative frequency graphs, and 
your bonus is nearly in the bag. He’s nearly finished preparing for the keynote, 
but there’s just one more thing he needs: a chart showing Manic Mango profits 
compared with the profits of their main rivals. Which chart should he use? 
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exercise solution 


0.0 1 ■ 
2003 


Here are two possible charts that the CEO could use in his keynote. Your task is to annotate 
each one, and say what you think the strengths and weaknesses are of each one relative to the 
other. Which would you pick? 


Profit in dollars 



2004 2005 


2006 


2007 

Year 
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Line charts should be used for numerical data 
only，and not categorical. This is because it 
makes sense to compare different categories, 
but not to draw a trend line. Only use a line 
chart if you’re comparing categories over some 
numerical unit such as time, and in that case 
you’d use a separate line for each category. 


visualizing information 


Line Charts Up Cl^se 


Line charts are good at showing trends in your data. For each set of data, you plot your 
points and then join them together with lines. You can easily show multiple sets of data 
on the same chart without it getting too cluttered. Just make sure it’s clear which line is 
which. 

As with other sorts of charts, you have a choice of showing frequency or percentages on 
the vertical axis. The scale you use all depends on what key facts you want to draw out. 

Line charts are often used to show time measurements. Time always goes on the 
horizontal axis, and frequency on the vertical. You can read off the frequency for any 
period of time by choosing the time value on the horizontal axis, and reading off the 
corresponding frequency for that point on the line. 
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bullet points and no dumb questions 


BULLET POINTS - 

■ Cumulative frequency is the total frequency up to a 
particular value. It's a running total of the frequencies. 

■ Use a cumulative frequency graph to plot the upper limit 
of each group of data against cumulative frequency. 

■ Use a line chart if you want to show trends, for example 
over time. 

■ You can show more than one set of data on a line chart. 
Use one line for each set of data, and make sure it’s 
clear which line is which. 


■ You can use line charts to make basic predictions as it’s 
easy to see the shape of the trend. Just extend the trend 
line, trying to keep the same basic shape. 

■ Don’t use line charts to show categorical data unless 
you’re showing trends for each category, for example 
over time. If you do this, draw one line per category. 


Are line charts the same thing as 
time series charts? I think I’ve heard that 
name used before. 

A time series chart is really a line chart 
that focuses on time intervals, just like the 
examples we used. A line chart doesn’t have 
to focus on just time, though. 

Are there any special varieties of 
line charts? 

Yes. In fact, you’ve encountered one 
of them already. The cumulative frequency 
graph is a type of line chart that shows the 
total frequency up to a certain value 



Can line charts show categorical 
data as well as data that’s numeric? 

Line charts should only be used to 
show categorical data if you’re showing 
trends for each category, and use a separate 
line for each category. 

What you shouldn't do is use a line chart to 
draw lines from category to category. 

So line charts are better for 
showing overarching trends, and bar 
charts are better for comparing values or 
categories? 

That’s right. Which chart you use really 
comes down to what message you want to 
put across, and what key facts you want to 
minimize. 


Now that I know how to create 
charts properly, can I use charting 
software to do the heavy lifting? 

Absolutely! Charting software can save 
you a lot of time and hard work, and the 
results can be excellent. 

The key thing with using software to produce 
your charts is to remember that the software 
can’t think for you. You still have to decide 
which chart best represents your key facts, 
and you have to check that the software 
produces exactly what you expect it to. 


42 Chapter 1 






visualizing information 


Manic Mango conquered the games market! 


You’ve helped produce some killer charts for Manic Mango, and thanks to you, 
the keynote was a huge success. Manic Mango has gained tons of extra publicity 
for their games, and money from sponsorship and advertising is rolling in. The 
only thing left for you to do is think about all the things you could do and the 
places you could go with your well-earned bonus. 

You’ve had your first taste of how statistics can help you and what you can achieve 
by understanding what’s really going on. Keep reading and we’ll show you more 
things you can do with statistics, and really start to flex those statistics muscles. 



Nice work with those 
charts! Weve got investors 
lining up outside the office. 
Take a long vacation, on me! 





2 ineasuring central tendency 


^ The Middle Way ^ 



Sometimes you just need to get to the heart of the matter. 

It can be difficult to see patterns and trends in a big pile of figures, and finding the 
average is often the first step towards seeing the bigger picture. With averages at 
your disposal, you’ll be able to quickly find the most representative values in your 
data and draw important conclusions. In this chapter, well look at several ways to 
calculate one of the most important statistics in town — mean, median, and mode — 
and you’ll start to see how to effectively summarize data as concisely and usefully 
as possible. 
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statsville health club uses averages 


Welcome to the Health Club 


The Statsville Health Club prides itself on its ability to find 
the perfect class for everyone. Whether you want to learn 
how to swim, practice martial arts, or get your body into 
shape, they have just the right class for you. 

The staff at the health club have noticed that their 
customers seem happiest when they’re in a class with 
people their own age, and happy customers always come 
back for more. It seems that the key to success for the health 
club is to work out what a typical age is for each of their 
classes, and one way of doing this is to calculate the 
average. The average gives a representative age for each 
class, which the health club can use to help their customers 
pick the right class. 

Here are the current attendees of the Power Workout class: 



J VStatsville's Premier Spa 


Ajc 2-0 



lo Ay \°i 
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How do we work out the average age of the Power Workout class? 


measuring central tendency 


A common measure of average is the mean 

It’s likely that you’ve been asked to work out averages before. One way to find the 
average of a bunch of numbers is to add all the numbers together, and then divide 
by how many numbers there are. 

In statistics, this is called the mean. 



Whafs wrong with just calling 
it the average? Ifs what I’m 
used to. 


Because there’s more than one sort of average. 

You have to know what to call each average, so you can easily 
communicate which one you’re referring to. It’s a bit like going to your 
local grocery store and asking for a loaf of bread. The chances are 
you’ll be asked what sort of bread you’re after: white, whole-grain, etc. 
So if you’re writing up your sociology research findings, for example, 
you’ll be expected to specify exactly what kinds of average calculations 
you did. 


Likewise, if someone tells you what the average of a set of data is, 
knowing what sort of average it is gives you a better understanding of 
what’s really going on with the data. It can give you vital clues about 
what information is being conveyed — or, in some cases, concealed. 

We’ll be looking at other types of averages, besides the mean, later in 
this chapter. 


you are here ► 
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statistics notation 



Mean math 

If you want to really excel with statistics, you’ll need to 
become comfortable with some common stats notation. It 
may look a little strange at first, but you’ll soon get used to it. 


Utters and numbers 


Almost every statistical calculation involves adding a bunch of 
numbers together. As an example, if we want to find the mean of 
the Power Workout class, we first have to add the ages of all the 
class attendees together. 

The problem statisticians have is how to generalize this. We don’t 
necessarily know in advance how many numbers we’re dealing with, 
or what they are. We currently know how many people are in the 
Power Workout class and what their ages are, but what if someone 
else joins the class? If we could only generalize this, we’d have a 
way of showing the calculation without rewriting it every time the 
class changes. 

Statisticians get around this problem by using letters to represent 
numbers. As an example, they might use the letter x to represent 
ages in the Power Workout class like this: 


Specific ages of class attendees 


General ages of class attendees 


19 20 20 20 21 


Each x represents the age of a separate person in the class. It’s a 
bit like labeling each person with a particular number x. 


X 1 X 2 X 3 X 4 X 5 


V 

oJf -tV^e 6tass a^es. 



\A/ C use %, as a ^ 

一 f 。二 nn 

t 。: 二 v 一 aWlaW ' 


Now that we have a general way of writing 
ages, we can use our x’s to represent them in 
calculations. We can write the sum of the 5 ages 
in the class as 

Sum = x. + x. + x. + x. + x_ 

1 2 3 4 5 


But what if we don’t know how 
many numbers we have to sum? 
What if we don’t know how many 
people are in the class? 
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measuring central tendency 


Pealing with unknowns 


Statisticians use letters to represent unknown numbers. But what if we don’t 
know how many numbers we might have to add together? Not a problem — 
we’ll just call the number of values n. If we didn’t know how many people were 
in the Power Workout class, we’d just say that there were n of them, and write 
the sum of all the ages as: 

Sum = + x 0 + x, + x. + x_ + - 

1 2 3 4 5 



Milt 


In this case, x n represents the age of the n\h person in the class. If there were 18 
people in the class, this would be x 18 , the age of the 18th person. 



Writing out all 
those x's looks like it 
could get arduous... 


We can take another shortcut. 

Writing x 1 + x 2 + x 3 + x 4 + ... + x n is a bit like saying “add age 
1 to age 2, then add age 3, then add age 4, and keep on adding 
ages up to age /z.” In day-to-day conversation it’s unlikely we’d 
phrase it like this. We’re far more likely to say “add together all 
of the ages •，’ It’s quicker, simpler, and to the point. 


We can do something similar in math notation by using the 


summation symbol 2 , which is the Greek letter Sigma. We can 

use 2x (pronounced “sigma x”）as a quick way of saying “add 
together the values of all the x’s.” 


O 

0 



X 


1 


X 


2 


X 


3 


X 


4 


X 


5 




X 


Do you see how much quicker and simpler this is? It’s just 
a mathematical way of saying “add your values together” 
without having to explicitly say what each value is. 

Now that we know some handy math shortcuts, let’s see how 
we can apply this to the mean. 


you are here ► 
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mean formulas 


Pack to the mean 

We can use math notation to represent the mean. 

To find the mean of a group of numbers, we add them all together, 
and then divide by how many there are. We’ve already seen how to 
write summations, and we’ve also seen how statisticians refer to the 
total count of a set of numbers as n. 

If we put these together, we can write the mean as: 

.__ Add all 靜 'bevs 

- ^ - …七 dW'idc by ^oy/evcv 

n *tKcv-c av-c- 

In other words, this is just a math shorthand way of saying “add 
together all of the numbers, and then divide by how many 
numbers there are.” 


The mean has its own symbol 

The mean is one of the most commonly used statistics around, 
and statisticians use it so frequently that they’ve given it a symbol 
all of its own: This is the Greek letter mu (pronounced “mew”). 
Remember, it’s just a quick way of representing the mean. 



Tke mean is one ol 
tke most frequently 
used statistics. It 
can te representect 
witk tke symliol |l. 
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measuring central tendency 



rpen your pencil 


Have a go at calculating the mean age of the Power Workout 
class? Here are their ages. 


t|° w people 
tHeire ^ ^ ^ 




Age 

19 

20 

21 

Frequency 

1 

3 

1 


The Case of the Ambiguous Average 


The staff at a local company are feeling mutinous about 
perceived unfair pay. Most of them are paid S500 per week, a 
few managers are on a higher salary, and the GEO takes home 
#49,000 per week. 


Five JVtlnufe 

JVtys-fery 



The average salary here is S2,500 per week, and we’re only 
paid S500,” say the workers. “This is unfair, and we 
demand more money •，’ 

One of the managers overhears this and joins in with the 
demands. “The average salary here is #10,000 per week, 
and I’m only paid S4,000. I want a raise.” 

The CEO looks at them all. “You’re all wrong; the average 
salary is S500 per week. Nobody is underpaid. Now get back to 
work.” 


WhaVs going on voith the average? Who do you 
think is right? 


you are here ► 
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mean and frequency 


^harpen your pencil 

Sobtion 


Have a go at calculating the mean age of the Power Workout 
class? Here are their ages. 


Age 

19 

20 

21 

Frequency 

1 

3 

1 


To -f md }a, wc Y\ttd *to add all people’s ajes, av\d divide by how ma^y *tlic\rc a\rc- 
This jives us |a — + 2.0 + ZO + 2.0 + Zl 




二 loo 

"7" 

-10 

The w\cBy\ ajc dldss is 2 . 0 . 




Handling frequencies 

When you calculate the mean of a set of numbers, you’ll often find that 
some of the numbers are repeated. If you look at the ages of the Power 
Workout class, you’ll see we actually have 3 people of age 20. 

It’s really important to make sure that you include the frequency of each 
number when you’re working out the mean. To make sure we don’t 
overlook it, we can include it in our formula. 

If we use the letter f to represent frequency, we can rewrite the mean as 



Fx 




I 


AWtiply cadh humbev- by 

• 心妞⑶ add 

the results -togethev-. 




This is just another way of writing the mean, but this time explicitly 
referring to the frequency. Using this for the Power Workout class gives us 

(^ = 1 x 19 + 3 x 20 + 1 x 21 

5 

= 20 

It’s the same calculation written slightly differently. 


52 Chapter 2 















measuring central tendency 


Pack to the Health Club 

Here’s another hopeful customer looking for the perfect 
class. Can you help him find one? 



I want a nice quiet class on a Tuesday 
evening where I can meet people my age. 
Do you think you can help me? 


This sounds easy enough to sort out. According to the 
brochure, the Health Club has places available in three 
of its Tuesday evening classes. The first class has a mean 
age of 17, the second has a mean of 25, and the mean 
age of the third one is 38. Clive needs to find the class 
with an average student age that’s closest to his own. 




Clive, a ^ - , 

7^ who C % 




Look at the mean ages for each class. 
Which class should Clive attend? 


you are here ► 
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when good means go bad 


Everybody was Kung Fu fightiwg 

Clive went along to the class with the mean age of 38. He was 
expecting a gentle class where he could get some nonstrenuous 
exercise and meet other people his own age. Unfortunately... 



I ended up in the Kung Fu 
class with lots of young uns and 
a few ancient masters. My back 
will never be the same again. 


What could have gone wrong? 

The last thing Clive expected (or wanted) was a 
class that was primarily made up of teenagers. 
Why do you think this happened? 

We need to examine the data to find out. Let’s 
see if sketching the data helps us see what the 
problem is. 




2 心 

W 
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measuring central tendency 




Sketch the histograms for the Kung Fu and Power Workout classes. (If you need a refresher on 
histograms, flip back to Chapter 1 ■) How do the shapes of the distributions compare? Why was 
Clive sent to the wrong class? 

Power Workout Classmate Ages 


Age 

19 

20 

21 

Frequency 

1 

3 

1 

Kung Fu Classmate Ages 


Age 

19 

20 

21 

145 

147 

Frequency 

3 

6 

3 

1 

1 
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exercise solutions 


A^e of Power Momont classmates 



— I - > 

18 19 20 21 22 23 age 


Age ofKimgFu classmates 



-- > 

19 20 21 22 145 146 147 148 age 


r^lj^rpen your pencil 


Do you think the mean can ever be the highest value 
in a set of numbers? Under what circumstances? 


Sketch the histograms for the Kung Fu and Power Workout classes. (If you need a refresher on 
histograms, flip back to Chapter 1 ■) How do the shapes of the distributions compare? Why was 
Clive sent to the wrong class? 

Power Workout Classmate Ages 


Age 

19 

20 

21 

Frequency 

1 

3 

1 

Kung Fu Classmate Ages 


Age 

19 

20 

21 

145 

147 

Frequency 

3 

6 

3 

1 

1 


at 


A 




Aouonlloli: 


。 j 

a cs to 
3 CP 





o 


4 


o 


AouanIJaJM- 



一 〆 3 

ouodc. 

n 3 
c ? 
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VC 

3 
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measuring central tendency 


A^e of KimgFu students 

Most tv.e ? eo ? le m ^ 

Mass a 代 avo^d a^c i- 



a 代 o^cvs. 


o 



0 


19 20 


—i- 

21 


-V- 






22 


145 146 147 148 


age 








Our data has outliers 

Did you see the difference in the shape of the charts for the Power 
Workout and Kung Fu classes? The ages of the Power Workout 
class form a smooth, symmetrical shape. It’s easy to see what a 
typical age is for people in the class. 

The shape of the chart for the Kung Fu class isn’t as straightforward. 
Most of the ages are around 20, but there are two masters whose 
ages are much greater than this. Extreme values such as these are 

called outliers. 


TV,e ^ buUoWv m 





What would the mean have been if the ancient 
masters weren’t part of the class? Compare this 
with the actual mean. What does this tell you 
about the effect of the outliers? 


A. 


AouanbaJll- 
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introducing outliers 

outliers 

Thc-btrftef A did it 

A 

If you look at the data and chart of the Kung Fu class, it’s easy to 
see that most of the people in the class are around 20 years old. In 
fact, this would be the mean if the ancient masters weren’t in the 
class. 

We can’t just ignore the ancient masters, though; they’re still part of 
the class. Unfortunately, the presence of people who are way above 
the “typical” age of the class distorts the mean, pulling it upwards. 

A^e of Fu students 



V?+aL S+a+fstfcs 

Outlier 


or low value 
ou*t -fvom {ht 
rcs*t o-f *thc ddid 


A 


o 

c 

0) 

3 

O 1 

o 


w asWs, mcav. 瓣 W 

be avowd 






21 




Can you see how the outliers have pulled the mean higher? This 
effect is caused by outliers in the data. When this happens, we say 
the data is skewed. 


g Vf+ai- Statfstf« 

Skowcdi Data 


- 1 - 1 - > 

22 145 146 147 148 age 


OU*tliCVS W pull^ *thc dd*bol 

*bo 七 he le-f*t or riglvt 


The Kung Fu class data is skewed to the right because if you line 
the data up in ascending order, the outliers are on the right. 

Let’s take a closer look at this. 


r %iharpea your pencil 

Solution 


Do you think the mean can ever be the highest value in a set of 
numbers? Under what circumstances? 


Yes i*t The is hijhcs*t value i-f dll ir\umbc\rs *m sc*t 3\rc *tiiC same- 
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measuring central tendency 


Skewed to the right 

Data that is skewed to the right has a “tail” of 
high outliers that trail off to the right. If you 
look at a right-skewed chart, you can see this 
tail. The high outliers in the Kung Fu class data 
distort the mean, pulling it higher — that is, to 
the right. 



oJf 





Most vaWes avc 
' 「 V>cvc, but 


A 





Skewed to the left 

Here’s a chart showing data that is skewed to the left. Gan 
you see the tail of outliers on the left? This time the outliers 
are low, and they pull the mean over to the left. In this 
situation, the mean is lower than the majority of values. 


Symmetric data 

In an ideal world, you’d expect data to be symmetric. 
If the data is symmetric, the mean is in the middle. 
There are no outliers pulling the mean in either 
direction, and the data has about the same shape on 
either side of the center. 


A 
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mean conversation 


Watercooler cowversatiow 



Hey Clive! I heard you joined the Kung Fu 
class. That*s really unexpected... 



Clive: They told me the average age for the class is about 38, so 
I thought I’d fit in alright. I had to sit down after 5 minutes before 
my legs gave out. 

Bendy Girl: But I didn’t see anyone that age in the class, so 
there must have been some sort of mistake in their calculations. 
Why would they tell you that? 

Clive: I don’t think their calculations were wrong; they just didn’t 
tell me what I really needed to know. I asked them what a typical 
sort of age is for the class, and they gave me the mean, 38. 

Bendy Girl: And that’s not really typical, is it? I mean, just 
looking at the people in the class, I would’ve thought that a 
younger age would be a bit more representative. 


Clive: If only they’d left the Ancient Masters out of their 
calculations, I would’ve known not to go to the class. That’s what 
did it; I’m sure of it. They distorted their whole calculation. 

Bendy Girl: Well, if the Ancient Masters are such a big 
problem, why can’t they just ignore them? Maybe that way they 
could come up with a more typical age for the class... 
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measuring central tendency 


Finding the median 

If the mean becomes misleading because of skewed data and outliers, 
then we need some other way of saying what a typical value is. We can 
do this by, quite literally, taking the middle value. This is a different 
sort of average, and it’s called the median. 

To find the median of the Kung Fu class, line up all the ages in 
ascending order, and then pick the middle value, like this: 


19 19 20 20 20 21 21 100 102 



middle- This is ZO. 


If you line all the ages up in ascending order, the value 20 is exactly 
halfway along. Therefore, the median of the Kung Fu class is 20. 

What if there had been an even number of people in the class? 

19 20 20 20 21 21 100 102 



t 


l-f s By\ ^urwbcv- of 

people m dass, -thcv-c will 

be Y\0 single middle ^umbev-. 


Tke mectian 
is always in 
tke mictclle. 

It’s tlie 

mictctle value. 


If you have an even set of numbers, just take the mean of the two 
middle numbers (add them together, and divide by 2), and that’s 
your median. In this case, the median is 20.5. 


We’ve seen that if you have 9 numbers, the median is the number at position 
5. If you have 8 numbers, it’s the number at position 4.5 (halfway between 
the numbers at position 4 and 5). What about if you have n numbers? 
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calculating the median: step-by-step 


How to find the median m three steps: 


1. Line your numbers up w order, from smallest to largest. 

t. If you have m odd number of values, the median is the owe 
w the middle. If you have n numbers, the middle number is 
at position (w + 1) /1 

3. If you have m even number of values, get the median by 
adding the two middle ones together ami dividing by l. 

You can find the midpoint by calculating (w + 1) / 1. The 
two middle numbers are oh cither side of this point. 


tJiereicire no o 

Dumb Questions 


Q/ Is it still OK to use the mean with skewed data if I really 
want to? 

You can, and people often do. However, in this situation the 
mean won’t give you the best representation of what a typical value 
is. You need the median. 

You say that, but surely the whole point of the mean is 
that it gives a typical value. It’s the average. 

The big danger is that the mean will give a value that doesn’t 
exist in the data set. Take the Kung Fu class as an example. If you 
were to go into the class and pick a person at random, the chances 
are that person would be around 20 years old because most people 
in the class are that sort of age. Just going with the mean doesn’t 
give you that impression. Finding the median can give you a more 
accurate perspective on the data. 

But sometimes even the median will give a value that’s not in the 
data set, like our example on the previous page. That’s precisely why 
there's more than one sort of average; sometimes you need to use 
different methods in order to accurately say what a typical value is. 


So is the median better than the mean? 

Sometimes the median is more appropriate than the mean, 
but that doesn’t make it better. Most of the time you’ll need to use 
the mean because it usually offers significant advantages over the 
median. The mean is more stable when you are sampling data. We’ll 
come back to this later in the book. 

How do I use the mean or median with categorical data? 
What about examples like the data on page 9 of Chapter 1? 

You can only find the mean and median of numerical data. 
Don’t worry, though, there's another sort of average that deals with 
just this problem that we'll explore later on. 

I always get right- and left-skewed data mixed up. How do 
I remember which is which? 

Skewed data has a “tail” of outliers. To see which direction 
the data is skewed in, find the direction the tail is pointing in. For 
example, right-skewed data has a tail that points to the right. 
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measuring central tendency 


BE 晰 

Your job is to play like you’re Ae data, and 
■ ^ say A^iat tire median is for each set, ^retiier 
H the data is shewed, and A^etiiei 9 tire mean 

is lusher or lower tk^ntke median. 
Give reasons ^y. 



Values 

1 

2 

3 

4 

5 

6 

7 

8 

Frequency 

4 

6 

4 

4 

3 

2 

1 

1 


Values 

1 

4 

6 

8 

9 

10 

11 

12 

Frequency 

1 

1 

2 

3 

4 

4 

5 

5 
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be the data solution 



BE th^ 

Your job is to play like you’re Ae data, and 
say ^attire median is for each set, Aether 
tire data is shewed, and blether tire mean 
is lusher or lower tiian tire median. 
Give reasons ^y. 


Values 

1 

2 

3 

4 

5 

6 

7 

8 

Frequency 

4 

6 

4 

4 

3 

2 

1 

1 


Thc\rc 3\rc ir\uw\bc\rs, air\d i-f you I'me dll up, is hal-f way 

b\oy\^, i.c., numbers alo^. The mediae is 3. The da*ta is skcv/cd *to -the 
v/hidh fulls -the n\tBY\ Thc\rc-fo\rc, *t^C is hijhcv- *thc 

media 灼 . 


Values 

1 

4 

6 

8 

9 

10 

11 

12 

Frequency 

1 

1 

2 

3 

4 

4 

5 

5 


The mediae hcv-c is lO. The da*td is skewed *to *thc so -the is 
fulled *to *tiiC Ic-ft Thc\rc-fo\rc, rntBY\ is lowcv *tha^ *thc medid^. 


I-f 七 he dd*bd is skewed *to *thc 
*thc mca^ is *to *thc 
jriojh'b o( *t^C medid^ (hi^hc\r). 


I-f *thc dd*td is skewed *to 
*thc Ic-ft ； *thc me 扣 is *to 七 he 
l^£t o( 七 he media 灼 (lov/cv-). 
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measuring central tendency 


business is booming 


Your work on averages is really paying off. More and more people are 
turning up for classes at the Health Club, and the staff is finding it 
much easier to find classes to suit the customers. 

This teenager is after a swimming class where he can make new 
friends his own age. 


The swimming 
class you have for 
teenagers sounds cool! 
Sign me up right now. 


The swimming class has a mean age of 17, and 
coincidentally, that’s the median too. It sounds like this 
class will be perfect for him. 


^ 0 


The^alth Club 

J V Statsville's Premier Spa 

Swimming Class 
Median age: 17 





Let’s see what happens... 
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when good medians go bad 


The Little Pucklings swimming class 

The Little Ducklings class meets at the swimming pool twice a 
week. In this class, parents teach their very young children how to 
swim, and they all have lots of fun splashing about in the water. 

Look who turned up for lessons... 







What do you think might have gone wrong this time? 
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measuring central tendency 



Freejuenc^ Magnets 


Here are the ages of people who go to the Little Ducklings class, but 
some of the frequencies have fallen off. Your task is to put them in the 
right slot in the frequency table. Nine children and their parents go to 
the class, and the mean and median are both 17. 


Age 

1 

2 

3 

31 

32 

33 

Frequency 

3 


2 

2 




□ a 0 


：^|^arpeti your pencil 


When you’ve figured out the frequencies for the Little Ducklings 
class, sketch the histogram. What do you notice? 


you are here ► 
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exercise solutions 




个 data 

The rf\tdiY\ 3iY\A / 



medid 灼 av-c hc\rc- 











— 

— 

— - ~ 

- 1 








2 


3 


4 


31 


32 


33 


34 


age 


look like oy\c sc*t o-f dd'td ； bu*t *bwo: oy\c -fo\r *thc oy\c *fo\T ^ildvc^. 


p 


Freejuenc^ Magnets 


Here are the ages of people who go to the Little Ducklings class, but 
some of the frequencies have fallen off. Your task is to put them in the 
right slot in the frequency table. Nine children and their parents go to 
the class, and the mean and median are both 17. 


Age 

1 

2 

3 

31 

32 

33 

Frequency 

3 

CD 

2 

2 

0 

Q 


I^VcVc -told av-c 
so the o-f -the 

乙 Wd 代 h must add up -to % Thcvc 
州 ust be 午 dhildv-ch O-f age Z. 


TV^C is n. I*f ^ substitute m d ar\A b *fov 
ur\kr\ovm -fvc^uc^tics, y/c yt 

Ul + ViA + liO. + ^1^2. + 如 + 视 二 n 


^ u l-tiply both 

sides by 


l« 

^ + ^>Z + - n%l« - lOi> 

l>Za + — 3>0^> 一 （3 + 召 + 厶 + 厶 2 J 二 孓 0 厶一 

lla + - 2H 

As lla + is od 山七 iVis mcar^s 七 liat b mus 七 bd 
dr\d d mus 七 loe 午 . 


^happen your pencil 

Solution 




When you’ve figured out the frequencies for the Little Ducklings 
class, sketch the histogram. What do you notice? 

Age of Little Duelling classmates 

A ⑽細 叶 


1 o 


Aouonlroli: 


SI 

c 

ao 

七 
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measuring central tendency 


What wcwt wrowg with the mean awd median? 

Let’s take a closer look at what’s going on. 

Here are the ages of people who go to the Little Ducklings class. 


1 1 1 2 2 2 2 3 3 31 31 32 32 32 32 33 33 33 



Thcv-c^s number values, so 

is hal-fway between Z 31- Take 
■the o-f "these *tv /0 ^umbev-s—(3 +5D/Z— 
Bv\d you H- 


The mean and median for the class are both 17, even though there are 
no 17-year-olds in the class! 

But what if there had been an odd number of people in the class. Both 
the mean and median would still have been misleading. Take a look: 


111222223 (£)31 31 32 32 32 32 33 33 33 

|-f v/c add ar\o*t^cv- 2 •- yeav ■- old 
-to *tKc ^Idss, mediae becomes 
3>. 3u 七 >/ha 七 abou 七七 he adul-b? 


If another two-year-old were to join the class, like we see above, the 
median would still be 3. This reflects the age of the children, but 
doesn’t take the adults into account. 

111222223 ( 3 ^ 31 31 32 32 32 32 33 33 33 

l-f wc add a^o*thc\r 31 -ycav--old 
*bo t\sss, mediae *ms*tcad 
bedomes H. This time, v/e 
kids’ 

If another 33-year-old were added to the class instead, the median 
would be 31. But that fails to reflect all the kids in the class. 

Whichever value we choose for the average age, it seems misleading. 


What should we do for data like this? 
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sharpen your pencil 



Here’s where you have to really think about how you can best 
give a representative age (or ages) for the Little Ducklings class. 
Here’s a reminder of the data: 


Age 

1 

2 

3 

31 

32 

33 

Frequency 

3 

4 

2 

2 

4 

3 


1. Why do you think the mean and median both failed for this data? Why are they misleading? 


2. If you had to pick one age to represent this class, what would it be? Why? 


3. What if you could pick two ages instead? Which two ages would you pick, and why? 
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measuring central tendency 



The Mcaift 

This week’s interview: 

The many types of average 


Head First ： Hey, Average, great to have you on the 
show". 

Mean ： Please, call me Mean. 

Head First ： Mean? But I thought you were 
Average. Did we mix up the guest list? 

Mean ： Not at all. You see, there’s more than one 
type of Average in Statsville, and I’m one of them, 
the Mean. 

Head First: There’s more than one Average? That 
sounds kinda complicated. 

Mean ： Not really, not once you get used to it. You 
see, we all say what a typical value is for a set of 
numbers, but we have different opinions about how 
to say what that is. 

Head First: So which one of you is the real 
Average? You know, the one where you add all the 
numbers together, and then divide by however many 
numbers there are? 

Mean ： That’s me, but please don’t call me the “real” 
Average; the other guys might get offended. The 
truth is that a lot of people new to Statsville see me 
as being Mr. Average. I have the same calculation 
that students see when they first encounter Averages 
in basic arithmetic. It’s just that in Statsville, I’m 
called Mean to differentiate between the other sorts 
of Average. 

Head First ： So do you have any other names? 

Mean ： Well, I do have a symbol, ji. All the rock stars 
have them. Well, some of them do. I do anyway. It’s 
Greek, so that makes me exotic. 

Head First ： So why are any of the other sorts of 
Average needed? 


Mean ： I hate to say it, but I have weaknesses. I lose 
my head a bit when I deal with data that has outliers. 
Without the outliers I’m fine, but then when I see 
outliers, I get kinda mesmerized and move towards 
them. It’s led to a few problems. I can sometimes 
end up well away from where most of the values are. 
That’s where Median comes in. 

Head Rrst: Median? 

Mean ： He’s so level-headed when it comes to 
outliers. No matter what you throw at him, he always 
stays right in the middle of the data. Of course, the 
downside of the Median is that you can’t calculate 
him as such; you can only work out what position he 
should be in. It makes him a bit less useful further 
down the line. 

Head First ： Do the two of you ever have the same 
value? 

Mean ： We do if the data’s symmetric; otherwise, 
there tends to be differences between us. As a general 
rule, if there are outliers, then I tend to wander 
towards them, while Median stays where he is. 

Head First: We re running out of time, so here’s 
one final question. Are there any situations where 
both you and Median have problems saying what a 
typical value is? 

Mean ： I’m afraid there is. Sometimes we need a 
little helping hand from another sort of Average. He 
doesn’t get out all that much, but he’s a useful guy to 
know. Stick around, and I’ll show you some of the 
things he’s up to. 

Head First ： Sounds great! 
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sharpen your pencil solution 



Here’s where you have to really think about how you can best 
give a representative age (or ages) for the Little Ducklings class. 
Here’s a reminder of the data: 


Age 

1 

2 

3 

31 

32 

33 

Frequency 

3 

4 

2 

2 

4 

3 


1. Why do you think the mean and median both failed for this data? Why are they misleading? 

Bo*th mean media 的 a\rc -fo\r *this sc*t bedduse -fully 

represents -the -typidal ajes o\ people *m the dlass. The suggests -tha-t 
Jo *to dldss ； y/he 的 m *t^C\rc a\rc y\oy\C- The media 的 also hds *this p\roblcrw, bu*t i*t 
dan -flud*tua*tc v/ildly i-f o*thc\r people jom dldss. 


2. If you had to pick one age to represent this class, what would it be? Why? 

|*t’s 的 o*t \rcally possible *to pidk a single -fully \rcpv-csc^*ts 'm -the dlass. 

The ^Idss is vcally made up <^f *two sc*ts of ays, -those o-f -the ^hildv-c^ *thosc 
*thc pa\rc^*ts. Y® u *t \rcdlly \rcp\rcsci^*t bo*th or *tiicsc groups W\{\\ d single i^umbcv-. 


3. What if you could pick two ages instead? Which two ages would you pick, and why? 

As i*t looks like *thc\rc a\rc *two sets data, i*t makes sc^sc *to pidk -two ajes *to 
\rcp\rcsci^*t *thc dldss ； oY\t -fo\r dhildhrc 的 erne -fo\r pa\rc^*ts. Wed dhoosc Z 
d^d VL) as -these a\rc *thc *tv/o d^e groups W\{\\ -the w\os*t people *m 你 
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measuring central tendency 


Number of sessions by class type 


Power 

Workout 

Kung Fu 


■ Number of Sessions 


Introducmg the mode 

In addition to the mean and median, there’s a third type of 
average called the mode. The mode of a set of data is the most 
popular value, the value with the highest frequency. Unlike the 
mean and median, the mode absolutely has to be a value in the 
data set, and it’s the most frequent value. 

Sometimes data can have more than one mode. If there is 
more than one value with the highest frequency, then each 
one of these values is a mode. If the data looks as though it’s 
representing more than one trend or set of data, then we can 
give a mode for each set. If a set of data has two modes, then we 
call the data bimodal. 

This is exactly the situation we have with the Little Ducklings 
class. There are really two sets of ages we’re looking at, one for 
parents and one for children, so there isn’t a single age that’s 
totally representative of the entire class. Instead, we can say 
what the mode is for each set of ages. In the Little Ducklings 
class, ages 2 and 32 have the highest frequency, so these ages are 
both modes. On a chart, the modes are the ones with the highest 
frequencies. 

It even works with categorical data 

The mode doesn’t just work with numeric data; it works 
with categorical data, too. In fact, it’s the only sort of average 
that works with categorical data. When you’re dealing with 
categorical data, the mode is the most frequently occurring 
category. 

You can also use it to specify the highest frequency group of 
values. The category or group with the highest frequency is 
called the modal class. 


Age 

1 

2 

3 

31 

32 

33 

Frequency 

3 

4 

2 

2 

4 

3 


ty/O^o 


T x . 少 

These two values a\rc the most 
populav, so they both modes. 


Age of Little Duelling classmates 

Hcv-c 3v-c wodcs] 七 hey 
V,avc 

,^ ， 


3 

2 




31 32 33 34 





age 


This d^*tc> is binrtoddl 
^odcs. 


The^alth Club 

/ V Statsville's Premier Spa 

Swimming Class 

Mo4e agej ： % an4 32 


Little Ducklings 


0 


2 


4 6 8 10 12 14 16 18 



frequency 


SSB^O 


Alwuop 
AouenboJI. 
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calculating the mode: step-by-step 


Three steps for finding the mode: 

1. Find all the distinct categories or values w your set of data. 

Z. Write down the frequency of each value or category. 

$. Pick out the owe(s) with the highest frequency to get the mode. 





r^l^arpen your pencil 


Find the mode for the following sets of data. 


Values 

1 

2 

3 

4 

5 

6 

7 

8 

Frequency 

4 

6 

4 

4 

3 

2 

1 

1 


Category 

Blue 

Red 

Green 

Pink 

Yellow 

Frequency 

4 

5 

8 

1 

3 


Values 

1 

2 

3 

4 

5 

Frequency 

2 

3 

3 

3 

3 


When do you think the mode is most useful? 


When is the mode least useful? 


74 Chapter 2 


































measuring central tendency 



I kick butt 
at soccer and 
statistics. 


Median time spent 
underwater each day ： 
24 minutes 


f I can run a 乂 

mile in a mean of 25 
minutes, but that includes 
a stop at Starbuzz Coffee 
^ on the way. 〆 


Congratulations! 

Your efforts at the Health Club are proving to be a huge success, 
and demand for classes is high. 


My mean golf 
score is two under par. 
But don't tell the ladies 
my median score is two 
over par. 


I lose a mean of 7 teeth 
per hockey match. 


you are here ► 


An experienced 
tennis coach like 
me earns a median 
salary of $33/hour 






sharpen your pencil solution 


c^Jjarpen your pencil 

Solution 


Find the mode for the following sets of data. 


Values 

1 

2 

3 

4 

5 

6 

7 

8 

Frequency 

4 

6 

4 

4 

3 

2 

1 

1 


The mode hc\rc is 2., as i*t has -the 


Category 

Blue 

Red 

Green 

Pink 

Yellow 

Frequency 

4 

5 

8 

1 

3 


This -time mode is 6\rtcr\- 


Values 

1 

2 

3 

4 

5 

Frequency 

2 

3 

3 

3 

3 


This set o-f daia has scvcv-al modes ： Z, Z, a^d 


When do you think the mode is most useful? 

W\\tY\ *thc dd*td set hds d low ^umbev- <^f modes ； o\r v/h ⑶ -the 
dd*td is ^a*tcjo\ri^al 'ms*tcad c^f ^umc\ridal. Nci*t^C\r -the n\tBY\ no\r 
•the median 匕 an be used da-tc^ov-idal data. 


When is the mode least useful? 

V^/\\tY\ *tiic\rc a\rc ma^y modes 



V?+aL S+a+lstits 

Mode 


The mode has to be \ y \ -fche da*ba 
set Its or\ly average 
y/orks y/i*th dd*bd- 
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measuring central tendency 



EiceticiSe 


Complete the table below. For each type of average we’ve encountered in the chapter, write 
down how to calculate it, and then give the circumstances in which you should use each one. Try 
your hardest to fill this out without looking back through the chapter. 


Average 

How to calculate 

When to use it 

Mean (|j) 


When the data is fairly symmetric 
and shows just the one trend. 

Median 



Mode 
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exercise solution 



Complete the table below. For each type of average we’ve encountered in the chapter, write 
down how to calculate it, and then give the circumstances in which you should use each one. Try 
your hardest to fill this out without looking back through the chapter. 


Average 

How to calculate 

When to use it 

Mean (|j) 

Use ci*tiic\r 

Jx. ^ _ - ^ is cadh value. 

w, -x ^ y\ is -the ^umbev 

^~ values. 

o\r 

^ - -f is 

\C -rv-c^uc^dy o-r 

'S-f 

When the data is fairly symmetric 
and shows just the one trend. 

Median 

L'me all values'm o\rdc\r. 

l-f d\re By\ odd i^umbcv- o^- values ； 

is ov\t m *thc middle. 

|-f *t^C\rc d\re By\ i^umbcv- o-f values, 

add *two middle ones *toythe\r, Br\d 

divide by *two. 

W\\cy\ -the dd*td is skewed because c^f 
ou*tlic\rs. 

Mode 

Choose valuers) liijhcs-t 

-fvc^uci^dy. 

|-f is shov/'m^ *two dlus*tc\rs o-f 

da*ta, \rcpo\rt a mode (or eadh ^\roup. 

youVc y/o\rk*mg dd*tc3o\ridal 

data- ' 

W\\tY\ shows *tv/o o\r mo\rc 

dlus*tc\rs of da*td- J 

Thc ° hl y type o( avev-age you dah 
^aldulatc -Po\r datcgov-idal data is 
the mode- 
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measuring central tendency 


^^arperi your pencil 


The generous CEO of Starbuzz Coffee wants to give all his employees a 
pay raise. He’s not sure whether to give everyone a straight $2,000 raise, 
or whether to increase salaries by 10%. The mean salary is $50,000, the 
median is $20,000, and the mode is $10,000. 


a) What happens to the mean, median, and mode if everyone at Starbuzz is given a $2,000 pay raise? 


b) What happens to the mean, median, and mode if everyone at Starbuzz is given a 1 0% pay raise instead? 


c) Which sort of pay raise would you prefer if you were earning the mean wage? What about if you were on 
the same wage as the mode? 
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79 






sharpen your pencil solution 

-智 yo K? 


The generous CEO of Starbuzz Coffee wants to give all his employees a 
pay rise. He’s not sure whether to give everyone a straight $2,000 raise, 
or whether to increase salaries by 10%. The mean salary is $50,000, the 
median is $20,000, and the mode is $10,000. 


a) What happens to the mean, median, and mode if everyone at Starbuzz is given a $2,000 pay raise? 


|-f x. \rcp\rcscir\*ts ov-i^mal y\ o( employees, 

^ + lOOO) 

^ Mediae ： Evcv-y v/d^e has fZ,000 added -to 

22.000 


The 

•^C3h 




There avc r> 
\o^o( tOOO. 


Y\ 


Y\ 


it -this mdudes middle value— 
The is 

izo t ooo + ii f ooo - 111,000. 


二 ^0,000 + ZOOO ^ 7 

- Addihg fZ,000 io 

^ evciryohes salav-y 

二 1^1,000 leases -the 

3hd mode by 

fZ,OQO. 


Modt : The r»\os*t o\r mode is 

j10,000, a^d wi*th jl.,000 pay vaisc, 

-this becomes 

jl0,000 + jzooo - j11,000. 


b) What happens to the mean, median, and mode if everyone at Starbuzz is given a 1 0% pay raise instead? 


This *timC| dll v/d^es 

Mcar\- ja — 2 ( 1 .W 

Y\ 


3\rC multiplied by I I (whidh is lOO% + 10%). 

Mediae ： Evcv-y y/a^e is multiplied! by 1.1, by\A 
•this mdudes 七 he middle value—*tiic median. 


二 m 

Y\ 


The r\cv/ is 

jzofioo u - izifioo. 


|hd\rcasmp — 
cvcv-yohc ； s saldv-y 
by 10% ihdvcascs 
the medidh 
dhd mode by 
10 %. 




二 I.U ^OfiOO 

二 ㈣ 000 


Mode 1 The mos*t dommo 灼 y/ajc o\r mode is 
j10,000, By\A wc multiply "this by U, i*t 

bedomes 

f10,000 ^ l.l - j 11,000. 


c) Which sort of pay raise would you prefer if you were earning the mean wage? What about if you 
were on the same wage as the mode? 


l-f you ca\nr\ *thc w\cby\ wajc, you’ll gc*t a lav-jev- pay mdv-casc i-f you yt 
s 10% pay raise- l-f you -the mode v/a^e ； you’ll yt mo\rc mo^cy i-f 
you ask -for s*braigii 七 f2.,000 pay mdv-casc- 
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measuring central tendency 


The Case of the Ambiguous Average: Solved 

What’s going on with the average? Who do you think 
is right? 


The workers, the managers, and the GEO are 
each using a different sort of average. 

The workers are using the median, which 
minimizes the effect of the CEO’s salary. 

The managers are using the mean. The large 
salary of the GEO is skewing the data to the right, 
which is making the mean artificially high. 



The GEO is using the mode. Most workers are paid S500 per week, 
and so this is the mode of the salaries. 


So who’s right? In a sense, they all are, although it has to be said 
that each group of people are using the average that best supports 
what they want. Remember, statistics can be informative, but 
they can also be misleading. For balance, we think that the most 
appropriate average to use in this situation is the median because 
of the outliers in the data. 
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3 measuring Variability and spread 

參 Power Ranges 



Not everything’s reliable, but how can you tell? 

Averages do a great job of giving you a typical value in your data set, but they 
don’t tell you the full story. OK, so you know where the center of your data 
is, but often the mean, median, and mode alone aren’t enough information 
to go on when you’re summarizing a data set. In this chapter, well show you 
how to take your data skills to the next level as we begin to analyze ranges 
and variation. 


this is a new chapter 83 


introducing the statsville all stars 


Wanted: owe player 

The Statsville All Stars are the hottest basketball team in the 
neighborhood, and they’re the favorite to win this year’s league. 
There’s only one problem — due to a freak accident, they’re a 
player down. They need a new team member, and fast. 

The new recruit must be good all-round, but what the coach 
really needs is a reliable shooter. If he can trust the player’s 
ability to get the ball in the basket, they’re on the team. 

The coach has been conducting trials all week, and he’s down 
to three players. The question is, which one should he choose? 


All three players have 
the same average score 
for shooting, but I need some 
way of choosing between them. 
Think you can help? 


M^ i，k 


o 




•tv saw 

stove Ws, 

so ^ should 

一 AtUt M 
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measuring variability and spread 


Wc need to compare player scores 

Here are the scores of the three players: 





Points scored per game 

7 

8 

9 

10 

11 

12 

13 

Frequency 

1 

1 

2 

2 

2 

1 

1 


"tells us "the hurwbev* 0 -(* 

whc\rc the playc\r jo-t tacM s^ov-c. This playev 
scored °\ poih-ts ih Z games, a^d IZ po'm-ts ih 
I ^Sme- 



Points scored per game 

7 

9 

10 

11 

13 

Frequency 

1 

2 

4 

2 

1 



Points scored per game 

3 

6 

7 

10 

11 

13 

30 

Frequency 

2 

1 

2 

3 

1 

1 

1 


Each player has a mean, median, and mode score of 10 points, but if you look at 
their scores, you’ll see they’ve all achieved it in different ways. There’s a difference in 
how consistently the players have performed, which the average can’t measure. 

What we need is a way of differentiating between the three sets of scores so that we 
can pick the most suitable player for the team. We need some way of comparing the 
sets of data in addition to the average — but what? 





What information in addition to the average 
would help the coach make his decision? 
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range measures data width 


\/^t wCasVAVC 




data W 
^ at a^vays- 


6C^tcv 



Bmetball player scores 


Use the rawgc to differentiate between 
data sets 

So far we’ve looked at calculating averages for sets of data, but quite 
often, the average only gives part of the picture. Averages give us a way of 
determining where the center of a set of data is, but they don’t tell us how 
the data varies. Each player has the same average score, but there are clear 
differences between each data set. We need some other way of measuring 
these differences. 


We can differentiate between each set of data by looking at the way in which 
the scores spread out from the average. Each player’s scores are distributed 
differently, and if we can measure how the scores are dispersed, the coach 
will be able to make a more informed decision. 


Measuring the range 


We can easily do this by calculating the range. The range tells us over 
how many numbers the data extends, a bit like measuring its width. To 
find the range, we take the largest number in the data set, and then 
subtract the smallest. 

The smallest value is called the lower bounds and the largest value is 

the upper bound. 

Let’s take a look at the set of scores for one of the players and see how 
this works. Here are the scores: 


8 9 9 10 10 11 12 13 





' Wpp 汁 

touhd 


To calculate the range, we subtract the lower bound from the upper 
bound. Looking at the data, the smallest value is 7, which means that 
this is the lower bound. Similarly, the upper bound is the largest value, 
or 13. Subtracting the lower bound from the upper bound gives us: 

Range = upper bound - lower bound 

= 13-7 

= 6 

so the range of this set of data is 6. 

The range is a simple and easy way of measuring how spread out 
values are, and it gives us another way of comparing sets of data. 



score 




me 孙 -tells us 
about W spread out 
data so v/c need some 
o-tVicv- measure *to *tcll us *tW»s. 


Vf+aL 




The is a y/ay of 

how spread ou*b d 
se*t o-f values arc- Its by 

Uppcv- bouhd — Loy/cr bour\d 

where uppev bouhd is 

hipest value ； loy/c\r 

bound loy/cs*t- 


A - 

Aouanbo^ 
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measuring variability and spread 



En^RciSe 


Work out the mean, lower bound, upper bound, and range for the following sets of data, and 
sketch the charts. Are values dispersed in the same way? Does the range help us describe 
these differences? 


Score 

8 

9 

10 

11 

12 

Frequency 

1 

2 

3 

2 

1 


Score 

8 

9 

10 

11 

12 

Frequency 

1 

0 

8 

0 

1 
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exercise solution 


Work out the mean, lower bound, upper bound, and range for the following sets of basketball 
scores, and sketch the charts. Are values dispersed in the same way? Does the range help us 
describe these differences? 


8.5 9.5 10.5 11.5 12.5 


score 



丁 W arC 


> score 


^ - 10 

Lowcv- bou^d — ^ 
Uppcv bou^d — IZ 
二 12 •-召 

二午 






午 


10.5 11.5 12.5 


Score 

8 

9 

10 

11 

12 

Frequency 

1 

0 

8 

0 

1 


Score 

8 

9 

10 

11 

12 

Frequency 

1 

2 

3 

2 

1 



Both data sets above have the same 
range, but the values are distributed 
differently. I wonder if the range really gives 
us the full story about measuring spread? 


The range only describes the width of the data, 
not how it’s dispersed between the bounds. 

Both sets of data above have the same range, but the second 
set has outliers — extreme high and low values. It looks like the 
range can measure how far the values are spread out, but it’s 
difficult to get a real picture of how the data is distributed. 



A, 




3 2 1 

AouanIJali: 


AA - 1 - T 

8 6 4 2 

A 0 uo)n7a±: 
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measuring variability and spread 


The problem with outliers 


The range is a simple way of saying what the spread of a set of 
data is, but it’s often not the best way of measuring how the data 
is distributed within that range. If your data has outliers, using 
the range to describe how your values are dispersed can be very 
misleading because of its sensitivity to outliers. Let’s see how. 


Imagine you have a set of numbers as follows: 


Lov/cv bour\d 


飞 


1 1 12222333334444555 


Here, numbers are fairly evenly distributed between the lower 
bound and upper bound, and there are no outliers for us to worry 
about. The range of this set of numbers is 4. 

But what happens if we introduce an outlier, like the number 10? 


—Io>mcv- bour\d 

But the uppev* bouKtd 

is still 1. 

has b> 10. | 


1 1 12222333334444555 10 


Our lower bound is the same, but the upper bound has gone up to 
10, giving us a new range of 9. The range has increased by 5 just 
because we added one extra number, an outlier. 

Without the outlier, the two sets of data would be identical, so why 
should there be such a big difference in how we describe how the 
values are distributed? 


Wtrds -the daia oy\ a vcv-tidal 
dhavt (a -type <^f bav dhav*t *tha 七 

uses I'mcs ms-beddi o( bavs). B-dth 







Can you think of a way in which we can construct 
a range that’s less sensitive to outliers? 
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range: uses and limitations 




The range is a great quick-and-dirty way to get an idea 
of how values are distributed, but it’s a bit limited. 

The range tells you how far apart your highest and lowest values are, but 
that’s about it. It only provides a very basic idea of how the values are 
distributed. 

The primary problem with the range is that it only describes the width of 
your data. Because the range is calculated using the most extreme values 
of the data, it’s impossible to tell what that data actually looks like — and 
whether it contains outliers. There are many different ways of constructing 
the same range, and sometimes this additional information is important. 


If the range is 
so limited, why do 
people use it? 


Mainly because it’s so simple. 

The range is so simple that it’s easily understood by lots 
of people, even those who have had very little exposure 
to statistics. If you talk about a range of ages, for 
example, people will easily understand what you mean. 

Be careful, though, because there’s danger in its pure 
simplicity. As the range doesn’t give the full picture of 
what’s going on between the highest and lowest values, 
it’s easy for it to be used to give a misleading impression 
of the underlying data. 
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measuring variability and spread 



Wait a sec, do you mean 
we pretend the outliers 
don’t exist? That doesn’t 
sound very scientific. 


We need a consistent way of doing this. 

One of the problems with ignoring outliers on an ad hoc basis is 
that it’s difficult to compare sets of data. How do we know that 
all sets of data are omitting outliers in exactly the same way? 

We need to make sure that we use the same mini range definition 
for all the sets of data we’re comparing. But how? 


Wc need to get away from outliers 

The main problem with the range is that, by definition, it includes outliers. If 
data has outliers, the range will include them, even though there may be only 
one or two extreme values. What we need is a way of negating the impact of 
these outliers so that we can best describe how values are dispersed. 

One way out of this problem is to look at a kind of mini range, one that 
ignores the outliers. Instead of measuring the range of the whole set of data, 
we can find the range of part of it, the part that doesn’t contain outliers. 



you are here ► 


91 






quartiles and the interquartile range 


Quartiles come to the rescue 


One way of constructing a mini range is to just use values around the center of 
the data. We can construct a range in this way by first lining up the values in 
ascending order, and then splitting the data into four equally sized chunks, with 
each chunk containing one quarter of the data. 




TWis *»s same da-ta 
as bcW, Wt tW»s -bwc 

As S 汁七•，山 ravWs. 



We can then construct a range using the values that fall between the two outer splits: 



between -these values 

J , j W . .)) ^ 

y\its us a bv~a^a mmi 

The values that split the data into equal chunks are known as quartiles, 
as they split the data into quarters. Finding quartiles is a bit like finding 
the median. Instead of finding the value that splits the data in half, we’re 
finding the values that split the data into quarters. 

The lowest quartile is known as the lower quartile, or first quartile 
(QJ), and the highest quartile is known as the upper quartile, or third 
quartile (Q3). The quartile in the middle (Q2) is the median, as it splits 
the data in half. The range of the values in these two quartiles is called the 

interquartile range (IQR). 



Some textbooks 
refer to 

quartiles as the 
\fl set of values 
within each 
quarter of the data. 


We J re not We 1 re using the term 
quartile to specifically refer to 
the values that split the data 
into quarters. 


Interquartile range = 

Upper quartile - Lower quartile 

The interquartile range gives us a standard, 
repeatable way of measuring how values are 
dispersed. It’s another way in which we can 
compare different sets of data. But what about 
outliers? Does the interquartile range help us 
deal with these too? Let’s take a look. 



V?+aL S+aflstfcs 

Quartiles 


Quartiles arc values 七 ha 七 spli-t your da*ba *m*to 
quarters. The loy/es 七 quartile is called 七 he 
loy/cr ^ua\rtilc, dhd highest ^uavtilc is 
called 七 he upper ^uav-tile. 


The middle ^uavtilc is -the medidh. 


92 Chapter 3 























measuring variability and spread 


The interquartile range excludes outliers 

The good thing about the interquartile range is that it’s a lot less 
sensitive to outliers than the range is. 


The upper and lower quartiles are positioned so that the lower quartile 
has 25% of the data below it, and the upper quartile has 25% of the 
data above it. This means that the interquartile range only uses the 
central 50% of the data, so outliers are disregarded. As we’ve said 
before, outliers are extreme high or low values in the data, so by only 
considering values around the center of the data, we automatically 
exclude any outliers. 


Here’s our data again. Gan you see how the interquartile range 
effectively ignores any outliers? 丁 ^ mtludcs 

middle dala … 


2Sy°lo 

data …从 e 
set 



… c^ludcs i\\t *tv/o outer 
^uav**tcv*s y/\icv*c ou*tl»cv*s IWc- 


2.*?% of 

da*ta 

set 


As the interquartile range only uses the central 50% of the data, 
outliers are excluded irrespective of whether they are extremely high or 
extremely low. They can’t be in the middle. This means that any outliers 
in the data are effectively cut out. 


g Vf+aL 

Ihtcv^uav-tilc 



A u m'mi less sensitive *to 

ou*tlicvs. 乂认 -f md i*t by 

Upper ^uavtilc - Loy/cr ^ua\rtilc 


0u*tl'»crs arc alv/ays e 如州 c Wi 吵 
or low values, ay\d m*tcv-^uav-fclc 
v*a 哼 £-u*U *tVicsc out 



Excluding the outliers with the interquartile range means 
that we now have a way of comparing different sets of data 
without our results being distorted by outliers. Before we can 
figure out the interquartile range, though, we have to work 
out what the quartiles are. Flip the page, and we’ll show you 
how it’s done. 
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a closer look at quartiles 



Quartile awatomy 

Finding the quartiles of a set of data is a very similar process to finding the median. 
If you line all of your values up in ascending order, the median is the value right in 
the very center. If you have n numbers, it’s the number that’s at position {n + 1) — 2, 
and if this falls halfway between two numbers, you take their average. 

If we then further split the data into quarters, the quartiles are the values at each of 
these splits. The lowest is the lower quartile, and the highest is the upper quartile: 


Q1 


Q2 


Q3 


poCDoCDocZD 


Lower Lower Median Upper Upper 

bound quartile quartile bound 


Finding the position of the quartiles is slightly trickier than finding the position of the 
median, as we need to make sure the values we choose keep the data split into the 
right proportions. There is a way of doing it though; let’s start with the lower quartile. 


finding the position of the lower quartile 

o 

o 


❺ 


First, start off by calculating n + 4. 

If this gives you an integer, then the lower quartile is positioned halfway 
between this position and the next one. Take the average of the 
numbers at these two positions to get your lower quartile. 

If n — 4 is not an integer, then round it up. This gives you the position of 
the lower quartile. 


As an example, if you have 6 numbers, start off by calculating 6 — 4, which 
gives you 1.5. Rounding this number up gives you 2, which means that the lower 
quartile is at position 2. 


Finding the position of the upper quartile 

O Start off by calculating 3n + 4. 


❺ 

❺ 


If it’s an integer, then the upper quartile is positioned halfway between 
this position and the next. Add the two numbers at these positions 
together and divide by 2. 

If 3n — 4 is not an integer, then round it up. This new number gives you 
the position of the upper quartile. 
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It’s time to put your quartile skills into practice. Here are the scores for one of the players: 


E%eitciSe 


Points scored per game 

3 

6 

7 

10 

11 

13 

30 

Frequency 

2 

1 

2 

3 

1 

1 

1 


1. What’s the range of this set of data? 



2. What are the lower and upper quartiles? 


3. What’s the interquartile range? 
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exercise solution 



Here are the scores for one of the players: 


Points scored per game 

3 

6 

7 

10 

11 

13 

30 

Frequency 

2 

1 

2 

3 

1 

1 

1 


1. What’s the range of this set of data? 


The lov/cv- bou^d of sc*t o-f is Z, as 
bou^d is SO, as hipest This jives 

二 uppev- bou^d — lov/cv- bound 


•tliaVs lov/cs*t i^umbcv- of pomts sdo\rcd- The uppev- 
us 


二 ％) - 3 

二 n 


What are the lower and upper quartiles? 

Lrt’s s*ta\rt v/i*th lov/cv- ^ua\rtilc- Thc\rc a\rc II numbers, d^d ddkula*t'm^ II 去午 ^ives us Z. 7 弓 . 
Roui^dmj *this 的 _be\r up ^ives us *thc position - *thc lov/cv- ^ua\rtilc, so *thc lov/c\r ^ua\rtilc is a*t posi*ticm 
i. This med^s *tiic lov/cv- ^ua\rtilc is ^>. 

Nov/ Irt’s -fmd uppev- «^ua\rtilc- ^ II 去午 jives us rou^dm^ -this up ^ives us ^ — -the 

upper «^ua\rtilc is a*t position % This med^s *t^C upper ^ua\rtilc is II. 


i ^ Q 7 7 (io) io io (7T) iz io 


Loweir ^uav-tilc 


/\/lcdi'»ar\ 




3. What’s the interquartile range? 

The m*tc\r^ua\rtilc va^jc is -the lov/cv- bou^d sub*tv"3d*tcd -fvom -the uppev- bou^dl- 
|^*tcv-^ua\rtilc v-a^jc — uppev* bound — lowcv- bou^d 

二 n -石 
二弓 



v 扣 y» as R 
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measuring variability and spread 



I get why mean, median, and mode are useful, but why 
do I need to know how the data is spread out? 

Averages offer you only a one-dimensional view of your 
data. They tell you what the center of your data is, but that’s it. 
While this can be useful, it's often not enough. You need some 
other way of summarizing your data in addition to the average. 

So is the median the same as the interquartile range? 

A- 

No. The median is the middle value of the data, and the 
interquartile range is the range of the middle 50% of the values. 

What’s the point of all this quartiles stuff? It seems like 
a really tedious way to calculate ranges. 

The problem with using the range to measure how your 
data is dispersed is that it’s very sensitive to outliers. It gives you 
the difference between the lower and upper bounds of your data, 
but just one outlier can make a huge difference to the result. 

We can get around this by focusing only on the central 50% of the 
data, as this excludes outliers. This means finding quartiles, and 
using the interquartile range. So even though finding quartiles 
is trickier than finding the lower and upper bounds, there are 
definite advantages. 


Should I always use the interquartile range to measure 
the spread of data? 

In a lot of cases, the interquartile range is more meaningful 
than the range, but it all depends on what information you 
really need. There are other ways of measuring how values are 
dispersed that you might want to consider too; we’ll come to 
these later. 

Would I ever want to look at just one quartile of my 
data instead of the range or the interquartile range? 

It’s possible. For example, you might be interested in what 
the high values look like, so you’d just look at what values are in 
the upper quarter of your data set, using the upper quartile as a 
cut-off point. 

Would I ever want to break my data into smaller pieces 
than quarters? How about breaking my data into, say, 10 
pieces instead of 4? 

Yes, there are times when you might want to do this. Turn 
the page, and we’ll show you more... 


^^^BUUET POINTS - 

■ The upper and lower bounds of the data are 
the highest and lowest values in the data set. 

■ The range is a simple way of measuring how 
values are dispersed. It’s given by: 

range = upper bound ■ lower bound 

■ The range is very sensitive to outliers. 

■ The interquartile range is less sensitive to 
outliers than the range. 


■ Quartiles are values that split your data 
into quarters. The highest quartile is called 
the upper quartile, and the lowest quartile is 
called the lower quartile. The middle quartile 
is the median. 

■ The interquartile range is the range of 
the central 50% of the data. It’s given by 
calculating 

upper quartile - lower quartile 
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splitting data into percentiles 


WcVe wot just limited to quartiles 

So far we’ve looked at how the range and interquartile range give us 
ways of measuring how values are dispersed in a set of data. The range 
is the difference between the highest number and the lowest, while the 
interquartile range focuses on the middle 50% of the data. 



0 O 


So are they the only 
sorts of ranges I can use? 
Do I get any other options? 


There are other sorts of ranges we can use in 
addition to the range and interquartile range. 

Our original problem with the range was that it’s extremely sensitive to 
outliers. To get around this, we divided the data into quarters, and we 
used the interquartile range to provide us with a cut-down range of the 
data. 

While the interquartile range is quite common, it’s not the only way of 
constructing a mini range. Instead of splitting the data into quarters, 
we could have split it into some other sort of percentage and used that 
for our range instead. 

As an example, suppose we’d divided our set of data into tenths instead 

of quarters so that each segment contains 10% of the data. We’d have 

something like this: ,., > 

匕 S 

tUD ,。 以如 data . 




<■ 


\Mt use -these divisions *to a bv-a^d mw mmi 




If you break up a set of data into percentages, the values that split the 
data are called percentiles. In the case above, our data is split into 
tenths, so the values are called deciles. 

We can use percentiles to construct a new range called the 

interpercentile range. 
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So what are pcrcewtiks? 


Percentiles are values that split your data into percentages in the same way that quartiles split data 
into quarters. Each percentile is referred to by the percentage with which it splits the data, so the 1 Oth 
percentile is the value that is 10% of the way through the data. In general, the xth percentile is the 

value that is k% of the way through the data. It’s usually denoted by P k . 


k% 



P 


k 


p value k% ^ 

丄 y t—M 祕 daU 


Quartiles are actually a type of percentile. The lower quartile is P 25 , 
and the upper quartile is P ?5 . The median is P 5Q . 

Percentile uses 

Even though the interpercentile range isn’t that commonly used, the 
percentiles themselves are useful for benchmarking and determining 
rank or position. They enable you to determine how high a particular 
value is relative to all the others. As an example, suppose you heard you 
scored 50 on your statistics test. With just that number by itself, you’d 
have no idea how well you’d done relative to anyone else. But if you 
were told that the 90th percentile for the exam was 50, you’d know that 
you scored the same as or better than 90% of the other people. 

Finding percentiles 

You can find percentiles in a similar way to how you find quartiles. 
o First of all, line all your values up in ascending order. 


❺ 

❺ 


To find the position of the Ath percentile out of n numbers, 
start off by calculating k (). 

If this gives you an integer, then your percentile is halfway 
between the value at position k ( ) and the next number 

along. Take the average of the numbers at these two positions 
to give you your percentile. 

If k ( ) is not an integer, then round it up. This then gives 

you the position of the percentile. 

As an example, if you have 125 numbers and want to find the 10th 
percentile, start off by calculating 10 x 125 — 100. This gives you a 
value of 12.5. Rounding this number up gives you 13, which means 
that the 10th percentile is the number at position 13. 


Statistics test scores 



PcvdChiilc 


The k*th pcv-dch*tilc is 
value k% of -the way 
*th\rou# you\r dd*bd- 
demoted by 

P 
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box and whisker plots 


Pox and whisker plots let you visualize ranges 


We’ve talked a lot about different sorts of ranges, and it would be 
useful to be able to compare the ranges of different sets of data in a 
visual way. There’s a chart that specializes in showing different types 
of ranges: the box and whisker diagram, or box plot. 

A box and whisker diagram shows the range, interquartile range, and 
median of a set of data. More that one set of data can be represented 
on the same chart, which means it’s a great way of comparing data 
sets. 



To create a box and whisker diagram, first you draw a box against a 
scale with the left and right sides of the box representing the lower 
and upper quartiles, respectively. Then, draw a line inside the box to 
mark the value of the median. This box shows you the extent of the 
interquartile range. After that, you draw “whiskers” to either side of 
your box to show the lower and upper bounds and the extent of the 
range. Here’s a box and whisker diagram for the scores of our player 
from page 95: 



Basketball player scores 


Heve’s a \rcw'»^dcv 

10 10 11 13 30 


Player 


Loy/cv bour^d 



If your data has outliers, the range will be wider. On a box 
and whisker diagram, the length of the whiskers increases in 
line with the upper and lower bounds. You can get an idea of 
how data is skewed by looking at the whiskers on the box and 
whisker diagram. 


If the box and whisker diagram is symmetric, this means that 
the underlying data is likely to be fairly symmetric, too. 



So box and whisker 
diagrams are really just 
a neat way of showing 
ranges and quartiles. 


100 Chapter 3 












measuring variability and spread 



Player A 



Here are box and whisker diagrams for two more basketball players. Compare the ranges of 
their scores. If you had to choose between having player A or player B on the team, which would 
you pick? Why? 

Basketball Player AandB scores 



Score 


Player B 



Dumb Quest? 


9ns 


I’m sure I’ve seen box and whisker diagrams that look a 
bit different than this. 

A- 

Jr \* There are actually several versions of box and whisker 
diagrams. Some have deliberately shorter whiskers and explicitly 
show outliers as dots or stars extending beyond the whiskers. This 
makes it easier to see how many outliers there are and how extreme 
they really are. Other diagrams show the mean as a dot, so you can 
see where it’s positioned in relation to the median. If you’re taking a 
statistics course, it would be a good idea to check which version of 
the box and whisker diagram is likely to be used. 


So if you show the mean as a dot, is it to the left or right 
of the median? 


A 


If the data is skewed to the right, then the mean will be to the 
right of the median, and the whisker on the right will be longer than 
that of the left. If the data is skewed to the left, the mean will be to 
the left of the median, and the whisker on the left will be the longest. 
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exercise solution 


BUnciSe 

gotlitlOH 


Here are box and whisker diagrams for each basketball player. Compare the ranges of their 
scores. If you had to choose between having player A or player B on the team, which would you 
pick? Why? 

Basketball Player Aa^B scores 


Player A 


Player B 



- 1 - 1 - 1 - 1 - 1 - i - i - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - > 

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 


Playcv- A has a \rda*tivcly small his media 的 sdov-c is a bi*t hijhcv- 

Player B ; s. 

Playcv- B has a vcv-y lav-gc Sometimes -this playev- sdov-cs a lot 
hijiicv- Playcv- A ； bu*t sometimes a lo 七 lov/cv-. 

Playcv- A plays mo\rc do^sis-tc^-tly usually sdo\rcs *tha^ Playcv- B 
(dompav-c 七 he medians av\d *m*tc\r^ua\rtilc so y/e’d pi 乙 k Playcv- A- 


Score 


^^BULLET POINTS - 

■ Percentiles split your data into percentages. 
They’re useful for benchmarking. 

■ The k\h percentile is k% of the way through 
your data. It's denoted by P k . 

■ An interpercentile range is like the 
interquartile range but, this time, between two 
percentiles. 


■ Box and whisker diagrams, or box plots, are 
a useful way of showing ranges and quartiles 
on a chart. A box shows where the quartiles 
and interquartile range are, and the whiskers 
give the upper and lower bounds. More than 
one set of data can be shown on the same 
chart, so they’re useful for comparisons. 
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measuring variability and spread 



The interquartile range looks useful, but 
what about players who sometimes get really 
low scores? If a player messes up on game day, 
it could cost us the league! I*m not sure that the 
range or the interquartile range tell me which 
player is really the most consistent. 


The coach doesn’t just need to compare the range of the 
players’ scores; he needs some way of more accurately 
measuring where most of the values lie to help him 
determine which player he can truly rely on come game day. 
In other words, he needs to find the player whose scores vary 
the least. 

The problem with the range and interquartile range is that 
they only tell you the difference between high and low values. 
What they don’t tell you is how often the players get these high 
or low scores versus scores closer to the center — and that’s 
important to the coach. 

The coach needs a team of players he can rely on. The last 
thing he wants is an erratic player who will play well one 
week and score badly the next. 

What can we do to help the coach make his decision? 


How caw we more accurately measure variability? 
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exploring variability 


A 


Player 2’s bmetb^ll scores 





The values for this second set of data are much closer 
to the mean and vary less. If the coach picks this 
player, he’ll have a good idea of how well the player 
is likely to perform in each game. 


score 



O 


So does that mean we 
just calculate the average 
distance from the mean? 


Let’s find out. 




Player fs basketball scores 


Y 



The values here are spread out quite a long way from 
the mean. If the coach picks this player for the team, 
he’s unlikely to be able to predict how the player will 
perform on game day. The player may achieve a 
very high score if he’s having a good day. On a bad 
day, however, he may not score highly at all, and that 
means he’ll potentially lose the game for the team. 


score 
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Variability is more thaw just spread 

We don’t just want to measure the spread of each set of scores; 
we want some way of using this to see how reliable the player is. 
In other words, we want to be able to measure the variability of 
the players’ scores. 

One way of achieving this is to look at how far away each value 
is from the mean. If we can work out some sort of average 
distance from the mean for the values, we have a way of 
measuring variation and spread. The smaller the result, the closer 
values are to the mean. Let’s take a look at this. 



Aouanboj: 


Aouanboj: 









measuring variability and spread 


Calculating average distances 


Imagine you have three numbers: 1, 2, and 9. The mean 
is 4. What happens if we find the average distance of 
values from the mean? 


dis-ta^c — Z 


Average distance = (1 to ja) + (2 to (^) + (9 to ja) 

3 


2 

H- 


dis-ta^dc — 1 > 


Ll = 4 







0 


TV^csc d'«stan6cs dav^el 
otV^cv out- 


9 


The average distance of values from the mean is always 
0. The positive and negative distances cancel each other 
out. So what can we do now? 



Dumb Quest! 


ons 


Why do we have -5 in that 
equation? I would have thought the 
distance would be 5. Why is it negative? 


Surely the distances don’t cancel 
out for all values. Maybe we were just 
unlucky. 


The distance from 9 to |j is negative 
because 卩 is less than 9.1 and 2 are both 
less than p, so the distance is positive for 
both of them. That’s why the distances 
cancel each other out. 

Can’t we just take the positive 
distances and average those? 

That sounds intuitive, but in practice, 
statisticians rarely do this. There's another 
way of making sure that the distances don’t 
cancel out, and you’ll see that very soon. 
This other way of determining how close 
typical values are to the mean is used a lot 
in statistics, and you’ll see it through most of 
the rest of the book. 


No matter what values you choose, the 
distances to the mean will always cancel out. 


Here's a challenge for you: take a group of 
numbers, work out the mean, work out the 
distance of each value from the mean, and 
then add the distances together. The result 
will be 0 every time. 


Can’t I use the interquartile range 
to see how reliable the scores are? 


The interquartile range only uses part 


of the data for measuring spread. If a player 
has one bad score, this will be excluded 
by the interquartile range. In order to truly 
determine reliability and consistency, we 


need to consider all the scores. 


The range uses all the scores. Why 
can’t we use that then? 

The range is only really good for 
describing the difference between the 
highest and lowest number. As you saw 
earlier, this doesn’t represent how the values 
are actually distributed. We need another 
measure to do this. 


Tke positive and 
negative distances 


: from tke mean 
cancel eack otker 


out. 
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variance and standard deviation measure variability 


Wc caw calculate variation with the variance 

We want a way to measure the average distance of values from 
the mean in a way that stops the distances from cancelling each 
other out. 



O 


We need a way of making all the 
numbers positive. Maybe if II work if 
we square the distances first. Then 
each number is bound to be positive. 


Let’s try this with the same three numbers. 


Average (distance: 


r 

(1 to j^) 2 + (2 to [if + (9 to 


RcmCmbcV 滅 |a 二今. 


3 

= 3 2 + 2 2 + (-5) 2 
3 

= 9 + 4 + 25 
3 



TWis 如 cVc 

add ，^ 細細 c 


=12.67 (to 2 decimal places) 


This time we get a meaningful number, as the distances don’t 
cancel each other out. Every number we add together has to 
be non-negative because we’re squaring the distance from the 
mean. Adding these numbers together gives us a non-negative 
result — every time. 



This method of measuring spread is called the variance, and 
it’s a very common way of describing the spread of a set of data. 


Here’s a general form of the equation: 



Variance = =( x _ 的 


The va\riar>dc is 
the average o-f 
"the dista^dcs -p\rorw 

the s^uav-cd- 


n 


V?+aL Statists 

\Zaria^c 

The variate is a y/ay of 

spread) ^v\d i*t ； s -the 

average o-f {ht dis-ta^c o-f 
values -from squared- 


1U - 〆 

Y\ 
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measuring variability and spread 


but standard deviation is a more intuitive measure 


Statisticians use the variance a lot as a means of measuring the 
spread of data. It’s useful because it uses every value to come up 
with the result, and it can be thought of as the average of the 
distances from the mean squared. 



But why should I have to think about 
distances squared? I hardly call that 
intuitive. IsiVt there another way? 


What we really want is a number that gives the spread 
in terms of the distance from the mean, not distance 
squared. 

The problem with the variance is that it can be quite difficult to think about 
spread in terms of distances squared. 

There’s an easy way to correct this. All we need to do is take the square root 
of the variance. We call this the standard deviation. 


Let’s work out the standard deviation for the set of numbers we had before. 
The variance is 12.67, which means that 

Standard deviation = V12.67 


= 3.56 (to 2 decimal places) 


In other words, typical values are a distance of 3.56 away from the mean. 


Standard deviation know-how 


We’ve seen that the standard deviation is a way of saying how far 
typical values are from the mean. The smaller the standard deviation, 
the closer values are to the mean. The smallest value the standard 


deviation can take is 0. 

Like the mean, the standard deviation has a special symbol, Q. This is 
the Greek character lowercase Sigma. (We saw uppercase Sigma, 2, in 
Chapter 2 to represent summation.) 

To find Q, start off by calculating the variance, and then 
take the square root. 

a = Vvariance 

0 

a 



Tm the standard 
deviation. If you need to 
measure distances from 
the mean, give me a call. 


O 


a 2 = variance 
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interview with standard deviation 



deviation 

This week’s interview: 

Getting the measure of Standard Deviation 


Head First ： Hey, Standard Deviation, great to see 
you. 

Standard Deviation ： It’s a real pleasure, Head 
First. 

Head First: To start off, I was wondering if you 
could tell me a bit more about yourself and what you 
do. 

Standard Deviation ： I’m really all about measuring 
the spread of data. Mean does a great job of telling 
you what’s going on at the center, but quite often, 
that’s not enough. Sometimes Mean needs support to 
give a more complete picture. That’s where I come in. 
Mean gives the average value, and I say how values 
vary. 

Head First ： Without meaning to be rude, why 
should I care about how values vary? Is it really all 
that important? Surely it’s enough to know just the 
average of a set of values. 

Standard Deviation ： Let me give you an example. 
How would you feel if you ordered a meal from the 
local diner, and when it arrived, you saw that half of it 
was burnt and the other half raw? 

Head First: I’d probably feel unhappy, hungry, and 
ready to sue the diner. Why? 

Standard Deviation: Well, according to Mean, 
your meal would have been cooked at the perfect 
temperature. Clearly, that’s not the full picture; what 
you really need to know is the variation. That’s where 
I come in. I look at what Mean thinks is a typical 
value, and I say how you can expect values to vary 
from that number. 

Head First ： I think I get it. Mean gives the average, 
and you indicate spread. How do you do that, though? 


Standard Deviation: That’s easy. I just say how 
far values are from the mean, on average. Suppose 
the standard deviation of a set of values is 3 cm. You 
can think of that as saying values are, on average, 3 
cm away from the mean. There’s a bit more to it than 
that, but if you think along those lines, you’re on the 
right track. 

Head First ： Speaking of numbers, Standard 
Deviation, is it better if you’re large or small? 

Standard Deviance ： Well, that really all depends 
what you’re using me for. If you’re manufacturing 
machine parts, you want me to be small, so you can 
be sure all the pieces are about the same. If you’re 
looking at wages in a large company, I’ll naturally be 
quite large. 

Head First ： I see. Tell me, do you have anything to 
do with Variance? 

Standard Deviation ： It’s funny you should ask that. 
Variance is just an alter ego of mine. Square me, and I 
turn into Variance. Take the square root of Variance, 
and there I am again. We’re a bit like Clark Kent and 
Superman, but without the cape. 

Head First ： Just one more question. Do you ever feel 
overshadowed by Mean? After all, he gets a lot more 
attention than you. 

Standard Deviation: Of course not. We re great 
friends, and we support each other. Besides, that 
would make me sound negative. I’m never negative. 

Head First ： Standard Deviation, thank you for your 
time. 

Standard Deviation: It s been a pleasure. 
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It’s time for you to flex those standard deviation muscles. Calculate the mean and standard 
deviation for the following sets of numbers. 


1 2 3 4 5 6 7 


1 2 3 4 5 6 
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exercise solution 



It’s time for you to flex those standard deviation muscles. Calculate the mean and standard 
deviation for the following sets of numbers. 

lytiOH 

1 2 3 4 5 6 7 ㈣ 心 二 (I - 午 ) z + (Z - 午 ) z + (M~) z + ( 午 - 午 ) z + (M~) z + (M~) z + (7 - 午 ) z 

Lt{!s s*ta\rt o-f-f by daldula*t'm^ -y 

1 + Z + 3 ++ $ + ^ + Z z + l z + 0 Z + Ul) z + ( 一 2J Z + (- 幻 z 

1 

-Z0 ^ 


1 


+ 午 + 1 +0 + 1 + 午 + 1 


1 


午 




午 


a - vn* - 2. 


1 2 3 4 5 6 

^-l+Z + 9 + ^ + ^ + ^> 


b 


- Zl 


Variance - 0 -说 + ( Z -说 + G -说 + (午-扮 + 内-扮 + 0>-说 

i 

- Z5 2 - + + 0^ + (一 0 石 ) z + (— IS) Z + (— zs) z 

I 

- i.Vy + Z.V? + 0.VS + 0.1^ + 1.VS + 




二 ns 




Z.^Z (*to Z dedimdl places) 


— 


a -VZ-1Z 
二 1.71 Cfcoj. dp) 
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Those calculations were tricky 
Isn’t there an easier way? 



The standard deviation calculation 
can quickly become complicated. 

To find the standard deviation, you first have to 
calculate variance, finding (x - jj,) 2 for every value of x. 

But there is a much simpler formula for variance 
that produces the same result.The equation’s on 
the opposite page, but first you’ll need to rescue the 
derivation from the pool. 





















measuring variability and spread 



Paa] puzzjc 

There’s an easier calculation for calculating the 
variance, but what is it? Your job is to take 
equation snippets from the pool, and 
place them into the blank lines in the 
derivation. Each snippet will be used 
only once, but you won’t need to use 
every one. Your goal is to get to the 
equation at the end. 


psst - d Wmt 


I(x - m ) 2 = Z(x - m) (x - M) 

n n 



n 


Note: each snippet 



you are here ► 


111 






















pool puzzle solution 



Paa] puzz]c 

There’s an easier calculation for calculating the 
variance, but what is it? Your job is to take 
equation snippets from the pool, and 
place them into the blank lines in the 
derivation. Each snippet will be used 
only once, but you won’t need to use 
every one. Your goal is to get to the 
equation at the end. 


Kx - m ) ： 


X(x m) (x - M) 


I(x 2 - 2|jx + M 2 ) 


n 


Ix^ 


n 


Ix^ 


2|j lx 
n 


IM 2 . 

~7i — 丁 avc 扒 


2M 


r\[x' 




丁 ““eve 




OVA"t* 


lx 2 - M 2 


You 

these 
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A quicker calculation for variance 


As you’ve seen, the standard deviation is a good way of measuring 
spread, but the necessary variance calculation quickly becomes 
complicated. The difficulty lies in having to calculate (x - j^) 2 for 
every value of x. The more values you’re dealing with, the easier it 
is to make a mistake — particularly if ja is a long decimal number. 



V?+aL Statistics 


Here’s a quicker way to calculate the variance: 

Variance = ■ M 2 

n 


Wtrcs -the 'uidker y/ay of 
ddldula*b'm^ variate 

Y 2- 2 - 

Y\ 


The advantage of this method is that you don’t have to calculate 
(x - j^) 2 . Which means that, in practice, it’s less tricky to deal with, 
and there’s less of a chance you’ll make mistake. 


tJiereiare no ^ 

Dumb Questi9ns 


So which form of the variance 
equation should I use? 

If you’re performing calculations, it’s 
generally easier to use the second form, 
which is: 

2 2 

Zx -M 

~n~ 

This is particularly important if you have a 
mean with lots of decimals. 

How do I work out the standard 
deviation with this form of the variance 
equation? 

Exactly the same way as before. Taking 
the square root of the variance gives you the 
standard deviation. 


What if I’m told what the standard 
deviation is, can I find the variance? 

Yes, you can. The standard deviation 
is the square root of the variance, which 
means that the variance is the square of the 
standard deviation. To find the variance from 
the standard deviation, square the value of 
the standard deviation. 

Q/ I find the standard deviation really 
confusing. What is it again? 

The standard deviation is a way of 
measuring spread. It describes how far typical 
values are from the mean. 

If the standard deviation is high, this means 
that values are typically a long way from the 
mean. If the standard deviation is low, values 
tend to be close to the mean. 


Can the standard deviation ever be 0? 

Yes, it can. The standard deviation is 
0 if all of the values are the same. In other 
words, if each value is a distance of 0 away 
from the mean, the standard deviation will 
be 0. 

Q/ What units is standard deviation 
measured in? 

It’s measured in the same units as 
your data. If your measurements are in 
centimeters, and the standard deviation 
is 1, this means that values are typically 1 
centimeter away from the mean. 

l，m sure l，ve seen formulas for 
variance where you divide by (n ■ 1) 
instead of n. Is that wrong? 

It's not wrong, but that form of the 
variance is really used when you’re dealing 
with samples. We’ll show you more about this 
when we talk about sampling later in the book. 
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be the coach 



BE . c©^c]i 

Here are the scores for tire 
three players. TKe mean for 
eacliof tiiem is 10. Your job 
is to play like you’re the 
coach, and wort out tire 
Standard deviation for each 
player. 偏 eh player is Ae 
most reliable one for your tem? 


Player 1 


Score 

7 

9 

10 

11 

13 

Frequency 

1 

2 

4 

2 

1 


Player 2 


Score 

7 

8 

9 

10 

11 

12 

13 

Frequency 

1 

1 

2 

2 

2 

1 

1 


Player 3 


Score 

3 

6 

7 

10 

11 

13 

30 

Frequency 

2 

1 

2 

3 

1 

1 

1 
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E%eftci$e 


The generous CEO of Starbuzz Coffee wants to give all his employees a pay raise. He’s not sure 
whether to give everyone a straight $2,000 raise or increase salaries by 10%. 


a) What happens to the standard deviation if everyone at Starbuzz is given a $2,000 pay raise? 


b) What happens to the standard deviation if everyone at Starbuzz is given a 10% pay raise instead? 
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be the coach solution 



BE ^ c©^c}i §©Jufi ©n 

Here are tire scores for tire 
tiiree players. TKe mean for 
each of tiiem is 10. Your job 
is to play like you’re lie 
coach, and worl^ out tire 
Standard deviation for each 
player. 偏 eh player is the 
most reliable one for your tem? 


Player 1 


Score 

7 

9 

10 

11 

13 

Frequency 

1 

2 

4 

2 

1 


\/av-ia^c =• 


下 + Z(f) + 午 (I0 Z ) + Z(ll z ) + IZ Z 


10 


一 100 


二午 1 + MZ + \00 + + \^°l 

- -100 

10 

- Z.Z 


Player 2 


Sta^dav-d Deviation —vX2- — I .午召 


Score 

7 

8 

9 

10 

11 

12 

13 

Frequency 

1 

1 

2 

2 

2 

1 

1 


下 … 摩 動的印 _, 00 


10 


- 午 1 + 石午 + MZ + ZOO + 2 • 午 i + I 午午 + 


一 100 


S*ta^da\rd Deviation 二^~二 l H 


Player 3 


Score 

3 

6 

7 

10 

11 

13 

30 

Frequency 

2 

1 

2 

3 

1 

1 

1 


var\aut - rm + ^ + zm + 獅 z ) + ip+ i^ z + so z 

- 一 100 

II 

二 10 + ^ + 10 + 900 + IZI + + ⑽ O l00 

fl 

二 竹 n 

Sta^dav-d Deviation —V 竹 2// 二 1.01 


Player I a^d Playcv- Z both have small sia^dard deviations, so -the values a\rc 
dlus*tc\rcd a\rou 灼 d -the w\cby\- Bu 七 Playcv- 3 has a s-ba^dav-dl deviation <^f 1.02., 
sdo\rcs a\rc *typidally 7.0Z po’m*ts away -from *thc mean. So Playcv-1 is 
most reliable, a^d Player 9 is the least 
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ExeRctSe 

§OLytiOH 


The generous CEO of Starbuzz Coffee wants to give all his employees a pay raise. He’s not sure 
whether to give everyone a straight $2,000 raise or increase salaries by 10%. 


a) What happens to the standard deviation if everyone at Starbuzz is given a $2,000 pay raise? 

The standard deviation stays exactly same- The -fijuv-cs av-c ； picked up 

Bv\d moved sideways, s。-the s*ta 灼 da\rd deviation docs^-t 


s*td^(i[d\r(J devia*tio^ — / + 2 - 000 ) -（沁 + 2 - 000 )) 


z 


Y\ 


2 U + looo - y -looo ) 1 


Y\ 


2( 乂 - /A) 


Y\ 


ov-i^mal s*td^dld\r(J deviation 


b) What happens to the standard deviation if everyone at Starbuzz is given a 10% pay raise instead? 

The s*td 灼 da\rd devid^ti 。 灼 is mu l*tiplicd by 110%, or I I. The -fijuv-cs arc s*t\rc*Uiicd, 
so standard deviation *md\rcascs. 


s*t^da\rd deviation — / — Q.Ija)) Z 


Y\ 


2I.I z U-Ia) z 


Y\ 




U / 1U - / 


Y\ 


l.l -times ov-i^mal s*t^da\rdl deviation 
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standard scores 


What if wg need a baseline for comparison? 

We’ve seen how the standard deviation can be used to measure how variable a 
set of values are, and we’ve used it to pick out the most reliable player for the 
Statsville All Stars. The standard deviation has other uses, too. 

Imagine a situation in which you have two basketball players of different ability. 
The first player gets the ball into the net an average of 70% of the time, and he 
has a standard deviation of 20%. The second player has a mean of 40% and a 
standard deviation of 10%. 



In a particular practice session, Player 1 gets the ball into the net 75% of the 
time, and Player 2 makes a basket 55% of the time. Which player does best 
against their personal track record? 



Thafs easy—Player 1 
does best. Player 1 scores 75% 
of the time, and Player 2 only 
scores 55% of the time. 


Just looking at the percentages doesn’t give 
the full picture. 

75% sounds like a high percentage, but we’re not taking into account the 
mean and standard deviation of each player. Each player has scored more 
than their personal mean, but which has fared better against their personal 
track record? How can we compare the two players? 

TV^c *tv/o ^layers V^avc 出伙七 



Does this sort of situation sound impossible? Don’t worry, we can achieve 
this with the standard score, or z-score. 
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Use standard scores to compare values across data sets 


Standard scores give you a way of comparing values across different sets of data 
where the mean and standard deviation differ. They’re a way of comparing 
related data values in different circumstances. As an example, you can use 
standard scores to compare each player’s performance relative to his personal 
track record — a bit like a personal trainer would. 

You find the standard score of a particular value using the mean and standard 
deviation of the entire data set. The standard score is normally denoted by the 
letter 之 ， and to find the standard score of a particular value x, you use the formula: 



vaW 乂， 


Let’s calculate the standard scores for each player, and see what those 
scores tell us. 


Calculating standard scores 

Let’s start by calculating z 1? the standard score of Player 1. 

z i = 75-70 
20 


20 


= 0.25 


So using the mean and standard deviation to standardize the score, 
Player 1 gets 0.25. What about the score for Player 2? 


z 2 = 55 - 40 

10 

=15 
10 
=1.5 


This gives us a standard score of 1.5 for Player 2, compared with a 
standard score of 0.25 for Player 1. But what does this actually mean? 
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interpreting standard scores 


IwtGrprGtiwg standard scores 


Standard scores give us a way of comparing values across different data sets even when the 
sets of data have different means and standard deviations. They’re a way of comparing 
values as if they came from the same set of data or distribution. 

So what does this mean for our basketball players? 

Each player’s shooting success rate has a different mean and standard deviation, which 
makes it difficult to compare how the players are performing relative to their own track 
record. We can see that in a particular practice, one player got the ball in the net more 
times than the other. We also notice that both players are scoring at a higher rate than their 
average. The difficulty lies in comparing performances relative to the personal track record 
of each player. 


The standard score makes such comparisons possible by transforming each set of data into 
a more generic distribution. We can find the standard score of each player at the practice 
session, and then transform and compare them. 


70 


Player 



75 


a = 20 






I 七 、 *to 
c.ow\>av-c 七…。 
data 


|i = 40 





Bu-t wc tav\ 

"them wi-th 


■s^o^rcs. 




data 


Player 2 



a = 10 


So what does this tell us about the players? 

The standard score for Player 1 is 0.25, while the standard 
score for Player 2 is 1.5. In other words, when we 
standardize the scores, the score for Player 2 is higher. 


This means that even though Player 1 is generally a better 
shooter and put balls into the net at a higher rate than 
Player 2, Player 2 performed better relative to his own track 
record. Player 2 performed better.. .for him. 
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- ^fanJarJ ^c^res Up Cl^se 

Standard scores work by transforming sets of data into a new, theoretical distribution 
with a mean of 0 and a standard deviation of 1. It’s a generic distribution that can 
be used for comparisons. Standard scores effectively transform your data so that it 
fits this model while making sure it keeps the same basic shape. 




Standard scores can take any value, and they indicate position relative to the mean. 
Positive z-scores mean that your value is above the mean, and negative z-scores 
mean that your value is below it. If your z-score is 0, then your value is the mean 
itself. The size of the number shows how far away the value is from the mean. 


Standard deviations from the mean 

Sometimes statisticians express the relative position of a particular value in terms 
of standard deviations from the mean. As an example, a statistician may 
say that a particular value is within 1 standard deviation of the mean. It’s really 
just another way of indicating how close values are to the mean, but what does it 
mean in practice? 

We’ve seen that using z-scores transforms your data set into a generic distribution 
with a mean of 0 and a standard deviation of 1. If a value is within 1 standard 
deviation of the mean, this tells us that the standard score of the value is between 
-1 and 1. Similarly, if a value is within 2 standard deviations of the mean, the 
standard score of the value would be somewhere between -2 and 2. 



Stanctarct 
score = numter 
oi stanctarct 
deviations from 
tke mean. 
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no dumb questions 



So variance and standard deviation both measure the 
spread of your data. How are they different from the range? 

The range is quite a simplistic measure of the spread of your 
data. It tells you the difference between the highest and lowest 
values, but that’s it. You have no way of knowing how the data is 
clustered within it. 

The variance and standard deviation are a much better way 
of measuring the variability of your data and how your data is 
dispersed, as they take into account how the data is clustered. 
They look at how far values typically are from the center of 
your data. 

And what’s the difference between variance and 
standard deviation? Which one should I use? 

The standard deviation is the square root of the variance, 
which means you can find one from the other. 

The standard deviation is probably the most intuitive, as it tells you 
roughly how far your values are, on average, from the mean. 


How do standard scores fit into all this? 

Standard scores use the mean and standard deviation to 
convert values in a data set to a more generic distribution, while at 
the same time, making sure your data keeps the same basic shape 

They’re a way of comparing different values across different data 
sets even when the data sets have different means and standard 
deviations. They’re a way of measuring relative standing. 

Do standard scores have anything to do with detecting 
outliers? 

• Good question! Determining outliers can be subjective, but 
sometimes outliers are defined as being more than 3 standard 
deviations of the mean. Statisticians have different opinions about 
this though, so be warned. 


BULLET POINTS - 

■ The variance and standard deviation measure 
how values are dispersed by looking at how far 
values are from the mean. 

■ The variance is calculated using 

Z(x-M ) 2 

~ n ~ 

■ An alternate form is 

2 2 

Ix_-M 


The standard deviation is equal to the square root 
of the variance, and the variance is the standard 
deviation squared. 

Standard scores, or z-scores, are a way of 
comparing values across different sets of data 
where the means and standard deviations are 
different. To find the standard score of a value x, 
use: 

x-u 
z = ― r 

a 
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Complete the table below. Name each type of measure of dispersion we’ve encountered 
in the chapter, and show how to calculate it. Try your hardest to fill this out without 
looking back through the chapter. 


Statistic 

How to calculate 

Range 



Upper quartile - Lower quartile 

Standard Deviation (a) 


Standard Score 



you are here ► 


123 








exercise solution 



§0Lvi ； 0rt 


Complete the table below. Name each type of measure of dispersion we’ve encountered 
in the chapter, and show how to calculate it. Try your hardest to fill this out without 
looking back through the chapter. 


Statistic 

How to calculate 

Range 

Uppcv bou 灼 d 一 Lov/c\r bou^d 


Upper quartile - Lower quartile 

Standard Deviation (a) 

Bo*th *thcsc jive 
/^ same result 

Standard Score 

z. —* 


a 
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measuring variability and spread 


Statsville All Stars wiw the league! 

All the basketball matches for the season have now been played, 
and the Statsville All Stars finished at the top of the league. You 
clearly helped the coach pick the best player for the team. 

Just remember: you owe it all to the friendly neighborhood 
standard deviation. 
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Taking Chances 


Whafs the probability he's 
remembered I*m allergic to 
non-precious metals? 


Life is full of uncertainty. 

Sometimes it can be impossible to say what will happen from one minute to the 
next. But certain events are more likely to occur than others, and that’s where 
probability theory comes into play. Probability lets you predict the future by 
assessing how likely outcomes are, and knowing what could happen helps you 
make informed decisions. In this chapter, you’ll find out more about probability 
and learn how to take control of the future! 


this is a new chapter 


welcome to fat dan’s casino 


Fat Paw's ftrawd Slam 

Fat Dan’s Casino is the most popular casino in the 
district. All sorts of games are offered, from roulette 
to slot machines, poker to blackjack. 

It just so happens that today is your lucky day. Head 
First Labs has given you a whole rack of chips to 
squander at Fat Dan’s，and you get to keep any 
winnings. Want to give it a try? Go on — you know 
you want to. 






W a U 一. 





There’s a lot of activity over at the roulette wheel, 
and another game is just about to start. Let’s see 
how lucky you are. 
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calculating probabilities 


Roll up for roulette! 



You’ve probably seen people playing roulette in movies even 
if you’ve never tried playing yourself. The croupier spins a 
roulette wheel, then spins a ball in the opposite direction, and 
you place bets on where you think the ball will land. 

The roulette wheel used in Fat Dan’s Casino has 38 pockets 
that the ball can fall into. The main pockets are numbered 
from 1 to 36, and each pocket is colored either red or black. 
There are two extra pockets numbered 0 and 00. These 
pockets are both green. 


Roulciie wheel 


二 red 


You can place all sorts of bets with roulette. For instance, 
you can bet on a particular number, whether that number 
is odd or even, or the color of the pocket. You’ll hear more 
about other bets when you start playing. One other thing to 
remember: if the ball lands on a green pocket, you lose. 


Roulette boards make it easier to keep track of which 
numbers and colors go together. 


Roulc*t*tc boav-d. (See 
pay 1^0 -fov a lav-yv- 
vcvsior>) 


V<>u pladc bets oh the 
pocket the ball will 
•fall ih-fco oh the wheel 

usi% the boav-d. 

l-f ball -falls 

m*to O ov 00 
podket, you losc| 


^HH ^HH ^HH ^ ■国 

1^9 〜 -B^ r H^ " H^ ra 

■■ ^0^ ^ ■國 



1st DOZEN 

2nd DOZEN 

3rd DOZEN I 


1 - 18 

EVEN 

o 

♦ 

ODD 

19-36 1 
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roulette board 
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34 

2to1 



1 



2 

4 6 



8 



10 


11 



13 



15 



17 




20 



22 



24 



26 



28 


29 





33 


35 



mm 


Your very owsroulette board 

YOUJ1 be placing a lot of roulette bets in this chapter. 
Hereus a handy roulette board for you to cut out and 
keep. You can use it to help work out the probabilities in 
this chapter. 

JUS C+- be sv-e-ful 芝一 中 * trse S6SSOV-S. 


I 100 


< 

p 

♦ 

h 


19-36 
















calculating probabilities 


Place your bets wow! 

Have you cut out your roulette board? The game is 
just beginning. Where do you think the ball will land? 
Choose a number on your roulette board, and then 
we’ll place a bet. 



Hold it right there! 

You want me to just make 
random guesses? I stand 
no chance of winning if I 
just do that. 


Right, before placing any bets, it makes 
sense to see how likely it is that you’ll win. 

Maybe some bets are more likely than others. It sounds 
like we need to look at some probabilities... 





What things do you need to think about 
before placing any roulette bets? Given 
the choice, what sort of bet would you 
make? Why? 
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finding probability 


What are the chawces? 


Have you ever been in a situation where you’ve wondered “Now, 
what were the chances of that happening?” Perhaps a friend has 
phoned you at the exact moment you’ve been thinking about them, 
or maybe you’ve won some sort of raffle or lottery. 

Probability is a way of measuring the chance of something 
happening. You can use it to indicate how likely an occurrence is 
(the probability that you’ll go to sleep some time this week), or how 
unlikely (the probability that a coyote will try to hit you with an 
anvil while you’re walking through the desert). In stats-speak, an 
event is any occurrence that has a probability attached to it — in 
other words, an event is any outcome where you can say how likely 
it is to occur. 


Probability is measured on a scale of 0 to 1. If an event is 
impossible, it has a probability of 0. If it’s an absolute certainty, 
then the probability is L A lot of the time, you’ll be dealing with 
probabilities somewhere in between. 


Here are some examples on a probability scale. 


/^possible 

\ 


0 




Jpvcak 6070-bc a^'«i 
attadk *«s 
let's 


E'ual 

or 灼。七 


0.5 


I 弋 


Tlurowihg a 匕 oih Sv\d 
•t Idhdih0 heads up 
happen ih about 
all -bosses. 


F3ll 'y ai 

po …七 duHhg a I6g_ 
hou ^ ?^<>d is aLosi 



V?+aL S+a+fstte 

Evcht 

ou*t^omc or odduvrchdc 

hds a probabili*ty assi^hed bo i*t 


Can you see how probability 
relates to roulette? 

If you know how likely the ball is to land on a 
particular number or color, you have some way 
of judging whether or not you should place a 
particular bet. It’s useful knowledge if you want 
to win at roulette. 
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calculating probabilities 



rpen your pencil 


Let’s try working out a probability for roulette, the probability of 
the ball landing on 7. We’ll guide you every step of the way. 


1. Look at your roulette board. How many pockets are there for the ball to land in? 


2. How many pockets are there for the number 7? 


3. To work out the probability of getting a 7, take your answer to question 2 and divide it by your 
answer to question 1. What do you get? 


4. Mark the probability on the scale below. How would you describe how likely it is that you’ll get a 7? 


0 


0.5 

| 


1 
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sharpen solution 


You had to work out a probability for roulette, the probability of 
the ball landing on 7. Here’s how you calculate the solution, step 
by step. 

1. Look at your roulette board. How many pockets are there for the ball to land in? 

Thc\rc a\rc podkcts. f - - Dcm’ 七 -Pov-yt that 七 he ball CBy\ la^d m 



%ihdrpen your pencil 

Solution 


4. Mark the probability on the scale below? How would you describe how likely it is that you'll get a 7? 
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calculating probabilities 


Find roulette probabilities 

Let’s take a closer look at how we calculated that probability. 

Here are all the possible outcomes from spinning the roulette 
wheel. The thing we’re really interested in is winning the 
bet — that is, the ball landing on a 7. 


Thc\rcs 


j us i one cvtfh-A 

㈣ 以 


UK DOZEN 


2nd DOZEM 


3rd DOZEN 


1 -IS EVEN 


O ♦ 


ODD 


19 - 36 





0VA U， cS ， a N 

60 u\d \3 十: 〜 


To find the probability of winning, we take the number of 
ways of winning the bet and divide by the number of possible 
outcomes like this: 


Probability = 


number of ways of winning 


number of possible outcomes 





We can write this in a more general way, too. For the 
probability of any event A: 


Probability of cvcht 

A odduvvihg 


P(A) 


n(A) 










S is known as the possibility space, or sample space. It’s 
a shorthand way of referring to all of the possible outcomes. 
Possible events are all subsets of S. 
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probabilities and venn diagrams 


You caw visualize probabilities with a V_ diagram 


Probabilities can quickly get complicated, so it’s often 
very useful to have some way of visualizing them. 
One way of doing so is to draw a box representing 


the possibility space S, and then draw circles for each 
relevant event. This sort of diagram is known as a Venn 
diagram. Here’s a Venn diagram for our roulette 
problem, where A is the event of getting a 7. 


: r 





otaW 、 切 


丁 V^s a VU ⑽， as 
arc 6 / o 

o 0 ssA)\c 〜十 W 


c 


Very often, the numbers themselves aren’t shown on the 
Venn diagram. Instead of numbers, you have the option 
of using the actual probabilities of each event in the 
diagram. It all depends on what kind of information you 
need to help you solve the problem. 

Complementary events 

There’s a shorthand way of indicating the event 
that A does not occur — A 1 . A 1 is known as the 

complementary event of A. 

There’s a clever way of calculating P(A ! ). A 1 covers every 
possibility that’s not in event A, so between them, A and 
A 1 must cover every eventuality. If something’s in A, it 
can’t be in A 1 , and if something’s not in A, it must be in 
A 1 . This means that if you add P(A) and P(A') together, 
you get 1. In other words, there’s a 100% chance that 
something will be in either A or A 1 . This gives us 

P(A) + P(A') = 1 
or 



P(A') = 1 - P(A) 
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calculating probabilities 


BE lh^ cr©i^lei 

Your job is to imagine you’re 
the croupier and worl^ out tire 
jJ probabilities of various events. 

‘ A For each event below, write down 
[ / {ke probability of a successful 

outcome. 




P(9) 


P(Green) 


P(Black) 


P(38) 
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be the roulette wheel solution 



BE . cr©i^lei §©lug©n 

Your job was to imagine you’re 
tire croupier and worl^ out tire 
probabilities of various events. 

For each event you should have 
written down tiie probability of 
a successful outcome. 


F 


、、 


p ⑼ 


The p\robabili*ty <^f yttmg a °[ is c^ad*tly same 

as a 7 ； as -thcv-c^s e^ual o-f -the 

ball -fallmj *m*to eddh podket 


Probability 


- O OVo (*to Z decimal places) 


P(Green) 

Z o( -the podkets a\rc ^rcch *thc\rc a\rc 
podkets 七 0 七 al, so: 


Probability 二 Z 

ze 

二 O.O^i (*to dedimdl pladcs) 


P(Black) 


P(38) 


This cvci^*t is dd*tually impossible—*t^C\rc 
is y\o podkrt labeled 3 召 . Thc\rc-fo\rc, *tiic 
p\robabili*ty is O. 

Probability 二 I 公 

ze 

二 O .午 7 午 （*to 3 dedimal places) 


1^ of *thc pockets a\rc blddk ； 3r\d *thc\rc a\rc 
podkets, so ： 


ILui 九 k 一 : 


138 Chapter 4 


calculating probabilities 


tliereiare no o 

Dumb Questi9ns 


Why do I need to know about 
probability? I thought I was learning 
about statistics. 

There's quite a close relationship 
between probability and statistics. A lot 
of statistics has its origins in probability 
theory, so knowing probability will take your 
statistics skills to the next level. Probability 
theory can help you make predictions about 
your data and see patterns. It can help you 
make sense of apparent randomness. You’ll 
see more about this later. 

Are probabilities written as 
fractions, decimals, or percentages? 


They can be written as any of these. 
As long as the probability is expressed in 
some form as a value between 0 and 1, it 
doesn’t really matter. 

I’ve seen Venn diagrams before in 
set theory. Is there a connection? 

There certainly is. In set theory, the 
possibility space is equivalent to the set of 
all possible outcomes, and a possible event 
forms a subset of this. You don’t have to 
already know any set theory to use Venn 
diagrams to calculate probability, though, as 
we’ll cover everything you need to know in 
this chapter. 


Do I always have to draw a Venn 
diagram? I noticed you didn’t in that last 
exercise. 

No, you don’t have to. But sometimes 
they can be a useful tool for visualizing 
what’s going on with probabilities. You’ll see 
more situations where this helps you later on 

Can anything be in both events A 
and A 1 ? 

No. A 1 means everything that isn’t in 
A. If an element is in A, then it can’t possibly 
be in A 1 . Similarly, if an element is in A 1 , then 
it can't be in A. The two events are mutually 
exclusive, so no elements are shared 
between them. 


Ifs time to play! 

A game of roulette is just about to begin. 

Look at the events on the previous page. 
We’ll place a bet on the one that’s most 
likely to occur — that the ball will land in a 
black pocket. 




Let's see what 
happens. 
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probabilities aren’t guarantees 


And the winning number is... 

Oh dear! Even though our most likely probability was 
that the ball would land in a black pocket, it actually 
landed in the green 0 pocket. You lose some of your 
chips. 

T hc “I Uded ih 

^ 0 po^kei so 

\ y° u los i some 匕 hips. 





There must be a fix! The probability 
of getting a black is far higher than 
getting a green or 0. What went wrong? I 
want to win! 


Probabilities are only indications of how likely 
events are; they’re not guarantees. 

The important thing to remember is that a probability indicates 
a long-term trend only. If you were to play roulette thousands 
of times, you would expect the ball to land in a black pocket 
in 18/38 spins, approximately 47% of the time, and a green 
pocket in 2/38 spins, or 5% of the time. Even though you’d 
expect the ball to land in a green pocket relatively infrequently, 
that doesn’t mean it can’t happen. 


Ho matter kow 

unlikely an event is, il 
it’s not impossible，it 
can still kappeiit 


140 


Chapter 4 







calculating probabilities 




We can use the probabilities we already 
know to work out the one we don’t know. 

Take a look at your roulette board. There are only three 
colors for the ball to land on: red, black, or green. As we’ve 
already worked out what P(Green) is, we can use this value to 
find our probability without having to count all those black 
and red pockets. 

P(Black or Red) = P(Green') 

=1 - P(Green) 

=1 -0.053 

= 0.947 (to 3 decimal places) 


Lcfs bet ow aw even more likely cvcwt 


Let’s look at the probability of an event that should be more 
likely to happen. Instead of betting that the ball will land in 
a black pocket, let’s bet that the ball will land in a black or a 
red pocket. To work out the probability, all we have to do is 
count how many pockets are red or black, then divide by the 
number of pockets. Sound easy enough? 


That*s a lot of pockets to 
count. Weve already worked out 
P(Black) and P(Green). Maybe we 
can use one of these instead. 


Bet: 


Red or Black 


^^arpen your pencil 


Don’t just take our word for it. Calculate the probability of getting 
a black or a red by counting how many pockets are black or red 
and dividing by the number of pockets. 


you are here ► 


141 









adding probabilities 


Don’t just take our word for it. Calculate the probability of getting 
a black or a red by counting how many pockets are black or red 
and dividing by the number of pockets. 

POladk ov- Red) — V> 

ze 

— O.^l (*to Z dermal places) 

So PfBla^k o\r Red) — I- 



You caw also add probabilities 

There’s yet another way of working out this sort of 
probability. If we know P(Black) and P(Red), we can find 
the probability of getting a black or red by adding these two 
probabilities together. Let’s see. 


P(Black or Red) =18+18 ^ 



38 38 


P(Black) + P(Red 


sf 如 sv a 此如 bo% 

/ 6 o 山 七 C 
- L 


yoss\ 


^VC r out V>cvc. 


?r0 V)aV)'\V\t^s ^ 
J SVA \t as ad<M 


In this case, adding the probabilities gives exactly 
the same result as counting all the red or black 
pockets and dividing by 38. 


142 Chapter 4 














calculating probabilities 


e v? 他 S+a+Jstte 

Pvobabilily 



To probabili*ty Bv\ 

cvcr\*t K use 

p ㈧ 二 ！ 

^(S) 


Vf+aL S+a+l 嫩 

A 1 


A 1 is *thc dompIcmCh-t^V'y CVCr\*t C^f 
A- H *thc p\robabili*ty 七 ha 七 cver\-t 


A does no 七 oddur. 

p(/v) 二 I 一 p ㈧ 


theretare no o 

Dumb Questions 


It looks like there are three ways of dealing with this sort 
of probability. Which way is best? 

It all depends on your particular situation and what information 
you are given. 

Suppose the only information you had about the roulette wheel was 
the probability of getting a green. In this situation, you’d have to 
calculate the probability by working out the probability of not getting 
a green: 

1 ■ P(Green) 

On the other hand, if you knew P(Black) and P(Red) but didn’t know 
how many different colors there were, you’d have to calculate the 
probability by adding together P(Black) and P(Red). 

So I don’t have to work out probabilities by counting 
everything? 


Often you won’t have to, but it all depends on your situation. It 
can still be useful to double-check your results, though. 

If some events are so unlikely to happen, why do people 
bet on them? 

A lot depends on the sort of return that is being offered. In 
general, the less likely the event is to occur, the higher the payoff 
when it happens. If you win a bet on an event that has a high 
probability, you’re unlikely to win much money. People are tempted to 
make bets where the return is high, even though the chances of them 
winning is negligible. 

Does adding probabilities together like that always work? 

Think of this as a special case where it does. Don’t worry, well 
go into more detail over the next few pages. 
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a new bet 


You wm! 


This time the ball landed in a red pocket, the number 7, so 


you win some chips. 



TW»S you ?«6kcd a 
Vm— ?odkct a v-cd O^c 




Time for another bet 

Now that you’re getting the hang of calculating 
probabilities, let’s try something else. What’s the 
probability of the ball landing on a black or even pocket? 
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Sometimes you can add together 
probabilities, but it doesn’t work in 
all circumstances. 

We might not be able to count on being able to do 
this probability calculation in quite the same way 
as the previous one. Try the exercise on the next 
page, and see what happens. 





calculating probabilities 


^^6rpefi your pencil 


Let’s find the probability of getting a black or even 
(assume 0 and 00 are not even). 


1. What’s the probability of getting a black? 


2. What’s the probability of getting an even number? 


3. What do you get if you add these two probabilities together? 


4. Finally, use your roulette board to count all the holes that are either black or even, then divide 
by the total number of holes. What do you get? 


o 


HH ^ HR ^ HH ^ HB 


in 


R Q 

ZJ\ 

fN 

rn- 


o 

P|. 

Q 3 ^ ■■ 3 g 

寸 

3 


1st DOZEM 2nd DOZEN 

3rd DOZEN 



1-18 

EVEN 

♦ 

ODD 

19 -邓 
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sharpen solution 


%ihdrpen your pencil 

Solution 


Let’s find the probability of getting a black or even (assume 0 and 
00 are not even). 


What’s the probability of getting a black? 

10 / ^ - O . 午 7 午 


2. What’s the probability of getting an even number? 

10 / ^ - O . 午 7 午 


3. What do you get if you add these two probabilities together? 

o.°tM 


4. Finally, use your roulettaboard to count all the holes that are either black or even, then divide by 
the total number of holes. lA/hat do you get? 






0 ,i 〆 



I don’t get it. Adding 
probabilities worked OK last 
time. What went wrong? 


Let’s take a closer look... 
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calculating probabilities 


Exclusive cvcwts and mtcrsectiwg evewts 

When we were working out the probability of the ball landing in a 
black or red pocket, we were dealing with two separate events, the 
ball landing in a black pocket and the ball landing in a red pocket. 
These two events are mutually exclusive because it’s impossible for 
the ball to land in a pocket that’s both black and red. 



If two events 
are mutually 
exclusive, only 
one ol tke two 
can occur. 


What about the black and even events? This time the events 
aren’t mutually exclusive. It’s possible that the ball could land in 
a pocket that’s both black and even. The two events intersect. 



£owc o-f *tV^c 

avc loo 七 h cv 灼 • 


If two events 
intersect, it’s 
possible tkey 
can occur 
simultaneously. 





What sort of effect do you think 
this intersection could have 
had on the probability? 
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intersection and union 


Problems at the mtcrsectiow 

Calculating the probability of getting a black or even went 
wrong because we included black and even pockets twice. 
Here’s what happened. 

First of all, we found the probability of getting a black 
pocket and the probability of getting an even number. 


Black 



P(Black) = 18 

38 

= 0.474 


P(Even) = 18 

38 

= 0.474 


Even 



When we added the two probabilities together, we 
counted the probability of getting a black and even 
pocket twice. 


Black 


Even 


Black 


Even 




TiiC 匕七 
y/as mdluded 七 Wide 




To get the correct answer, we need to subtract the 
probability of getting both black and even. This gives 
us 


P(Black Pi Even) = 10 

38 

= 0.263 



P(Black or Even) = P(Black) + P(Even) - P(Black and Even) 

We can now substitute in the values we just calculated to find P(Black or Even): 


# oUb 


P(Black or Even) = 18/38 + 18/38 - 10/38 = 26/38 = 0.684 


148 Chapter 4 




calculating probabilities 


Some more wotatiow 


There’s a more general way of writing this using some 
more math shorthand. 

First of all, we can use the notation A Pi B to refer to 
the intersection between A and B. You can think of 
this symbol as meaning “and.” It takes the common 
elements of events. 



A U B, on the other hand, means the union of A and B. 
It includes all of the elements in A and also those in B. 
You can think of it as meaning “or.” 

If A U B = 1, then A and B are said to be exhaustive. 
Between them, they make up the whole of S. They 
exhaust all possibilities. 


ill tersectio 



Union 





Thc shaded 

心 M(Jg 



rpen your pencil 


On the previous page, we found that 

(P(Black or Even) = P(Black) + P(Even) - P(Black and Even) 

Write this equation for A and B using fl and U notation. 
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sharpen solution 


%ihdrpen your pencil 

Solution 


P( A om B) 


On the previous page, we found that 

(P(Black or Even) = P(Black) + P(Even) - P(Black and Even) 

Write this equation for A and B using Pi and U notation. 


P(A U B) 二 P(A) + P(B) - P(A n B) ^ 





So why is the equation for exclusive 
events different? Are you just giving 
me more things to remember? 


Ifs not actually that different. 

Mutually exclusive events have no elements in common with each 
other. If you have two mutually exclusive events, the probability of 
getting A and B is actually 0 — so P(A Pi B) = 0. Let’s revisit our black- 
or-red example. In this bet, getting a red pocket on the roulette wheel 
and getting a black pocket are mutually exclusive events, as a pocket 
can’t be both red and black. This means that P(Black Pi Red) = 0, so 
that part of the equation just disappears. 



There，s a difference 
between exclusive 
and exhaustive. 

If events A and B are 
exclusive, then 


P(A HB) = 0 

If events A and B are exhaustive, then 
P(A U B) = 1 
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calculating probabilities 



BE . 

Your job is to play like you’re Ae 
彳 probability and diade in {^e area 
that represents each of the following 
probabilities on tiie Venn 
diagrams. 




s 


P(A n b) + p(a n b 1 ) 



s 


p(A' n b 1 ) 



s 


P(A U B) - P(B) 
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be the probability solution 



BE . p©b&lll{y §©lug©ti 

Your job was to play like youTe the probability 
and diade in the area Aat represents each of 
the probabilities on the Venn diagrams. 



s 


P(A n b) + p(a n b 1 ) 



s 


p(A' n b 1 ) 
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calculating probabilities 





50 sports enthusiasts at the Head First Health Club are asked whether they play baseball, 
football, or basketball. 10 only play baseball. 12 only play football. 18 only play basketball. 6 play 
baseball and basketball but not football. 4 play football and basketball but not baseball. 

Draw a Venn diagram for this probability space. How many enthusiasts play baseball in total? 
How many play basketball? How many play football? 

Are any sports’ rosters mutually exclusive? Which sports are exhaustive (fill up the possibility 
space)? 



V?+aL Statistics 

A or B 


To -fmd probabili*ty o-f eveh*t 

A o\r B, use 


P(A U B ) 二 P(A) + P(B) - P(A n B) 


U OR 

Pi AND 
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exercise solution 



50 sports enthusiasts at the Head First Health Club are asked whether they play baseball, football 
or basketball. 10 only play baseball. 12 only play football. 18 only play basketball. 6 play baseball 
and basketball but not football. 4 play football and basketball but not baseball. 

Draw a Venn diagram for this probability space. How many enthusiasts play baseball in total? How 
many play basketball? How many play football? 

Are any sports’ rosters mutually exclusive? Which sports are exhaustive (fill up the possibility 
space)? 



S 

Tiiis 

looks 七 ed ， 

bu*t dvav/'m^ a 

VtY\v\ diayam 
y/ill iiclp us *to 
visualize 
or\. 


By values m eadh tWt\t m Vcv\y\ diayrd 你 , wc 乙扣 

av-c l^> -total baseball playcv-s, -bo-tal basketball playcv-s, a^d l^> -bo-tal -football playcv-s. 

The baseball -foo-tball events av-c mutually elusive- Nobody plays bo-th baseball 
-football so POascball H Fooiball) - O 

The events -fov- baseball, -foo-tball, basketball a\rc c^aus*tivc. -they -fill -the 

c^*ti\rc possibility spade, so PfBascball U Football U Basketball) — I 


thereiore no o 

Dumb Questi9ns 


Are A and A 1 mutually exclusive or 
exhaustive? 

Actually they’re both. A and A 1 can 
have no common elements, so they are 
mutually exclusive. Together, they make 
up the entire possibility space so they’re 
exhaustive too. 
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Qj Isn’t P(A H B) + P(A n B 1 ) just a 
complicated way of saying P(A)? 

Yes it is. It can sometimes be useful 
to think of different ways of forming the 
same probability, though. You don’t always 
have access to all the information you’d 
like, so being able to think laterally about 
probabilities is a definite advantage. 


Is there a limit on how many events 
can intersect? 

No. When you're referring to the 
intersection between several events, use 
more D's. As an example, the intersection of 
events A, B, and C is A fl B fl C. 

Finding probabilities for multiple 
intersections can sometimes be tricky. We 
suggest that if you’re in doubt, draw a Venn 
diagram and take a good, hard look at which 
probabilities need to be added together and 
which need to be subtracted. 







calculating probabilities 


Another unlucky spin... 

We know that the probability of the ball landing on black or even 
is 0.684, but, unfortunately, the ball landed on 23, which is red and 
odd. 


but ifs time for another bet 


Even with the odds in our favor, we’ve been unlucky with the outcomes at 
the roulette table. The croupier decides to take pity on us and offers a little 
inside information. After she spins the roulette wheel, she’ll give us a clue 
about where the ball landed, and we’ll work out the probability based on 
what she tells us. 



Here's your next 
bet...and a hint about 
where the ball landed. 
Shh y don’t tell Fat Dan... 


Bet: Even 

Clue ： The ball 
landed in a 


black P° 


cket 




Should we take this bet? 

How does the probability of getting even given that 
we know the ball landed in a black pocket compare 
to our last bet that the ball would land on black or 
even. Let’s figure it out. 
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introducing conditional probability 


Conditions apply 


The croupier says the ball has landed in a black pocket. 
What’s the probability that the pocket is also even? 



O 


But weve 
already done this; 

\Ys just the probability of 
getting black and even. 


This is a slightly different problem 

We don’t want to find the probability of getting a pocket 
that is both black and even, out of all possible pockets. 
Instead, we want to find the probability that the pocket is 
even, given that we already know it’s black. 


thc is bl^k. 

In other words, we want to find out how many pockets 
are even out of all the black ones. Out of the 18 black 
pockets, 10 of them are even, so 

P(Even given Black) =10 

18 





命 t w —W 、 切 




V)\3£.k* 


Black 




0.556 (to 3 decimal places) 


It turns out that even with some inside information, our odds are 
actually lower than before. The probability of even given black is 
actually less than the probability of black or even. 

However, a probability of 0.556 is still better than 50% odds, so 
this is still a pretty good bet. Let’s go for it. 
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Find conditional probabilities 

So how can we generalize this sort of problem? First of 
all, we need some more notation to represent conditional 
probabilities, which measure the probability of one event 
occurring relative to another occurring. 



If we want to express the probability of one event happening 
given another one has already happened, we use the “ | symbol 
to mean “given.” Instead of saying “the probability of event A 
occurring given event B,” we can shorten it to say 


P(A I B) 




So now we need a general way of calculating P(A | B). What 
we’re interested in is the number of outcomes where both A and 
B occur, divided by all the B outcomes. Looking at the Venn 
diagram, we get: 


P(A I B) = P(A n B) 

P(B) 


P(3) 


We can rewrite this equation to give us a way of finding P(A Pi B) 
P(A n B) = P(A I B) X P(B) 

It doesn’t end there. Another way of writing P(A Pi B) is P(B Pi A). 
This means that we can rewrite the formula as 


Because wcVc *fco -fmd 

pvobabili*ty o-f 灼 B, wcVc 。灼 ly 

\y\ sc*t 


o 乙匕 uvs. 



p(A n B) 


P(B n A) = P(B I A) X P(A) 

In other words, just flip around the A and the B. 



Venn diagrams aren’t always the best way of 
visualizing conditional probability. 

Don’t worry, there’s another sort of diagram you can use — a 
probability tree. 
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It looks like it can be difficult to show 
conditional probability on a Venn diagram. 
I wonder if there's some other way. 











probability trees 


You caw visualize conditional probabilities with a probability tree 


It’s not always easy to visualize conditional probabilities with 
Venn diagrams, but there’s another sort of diagram that really 
comes in handy in this situation — the probability tree. 
Here’s a probability tree for our problem with the roulette 
wheel, showing the probabilities for getting different colored 
and odd or even pockets. 







a \\ 咖 



The first set of branches shows the probability of each 
outcome, so the probability of getting a black is 18/38, or 
0.474. The second set of branches shows the probability 
of outcomes given the outcome of the branch it is 
linked to. The probability of getting an odd pocket given 
we know it’s black is 8/18, or 0.444. 
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Trees also help you calculate conditional probabilities 


Probability trees don’t just help you visualize probabilities; they can help 
you to calculate them, too. 


Let’s take a general look at how you can do this. Here’s another 
probability tree, this time with a different number of branches. It shows 
two levels of events: A and A 1 and B and B 1 . A 1 refers to every possibility 
not covered by A, and B 1 refers to every possibility not covered by B. 

You can find probabilities involving intersections by multiplying the 
probabilities of linked branches together. As an example, suppose you 
want to find P(A Pi B). You can find this by multiplying P(B) and P(A | B) 
together. In other words, you multiply the probability on the first level B 
branch with the probability on the second level A branch. 


丁。 



P(A n B) = P(A I B) X P(B) 


P(A' n B) = P(A' I B) x P(B) 


P(A fl B 1 ) = P(A I B 1 ) x P(B') 


P(A' H B 1 ) = P(A' I B 1 ) x P(B') 


命―恸咖. 


Using probability trees gives you the same results you saw earlier, and 
it’s up to you whether you use them or not. Probability trees can be time- 
consuming to draw, but they offer you a way of visualizing conditional 
probabilities. 
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Protatilit^ Magnets 

Duncan’s Donuts are looking into the probabilities of their customers buying 
donuts and coffee. They drew up a probability tree to show the probabilities, 
but in a sudden gust of wind, they all fell off. Your task is to pin the 
probabilities back on the tree. Here are some clues to help you. 


P(Donuts) = 3/4 P(Coffee | Donuts 1 ) = 1/3 


P(Donuts fl Coffee) = 9/20 






SD 
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gaudy terts for wording with trees 

1 Wort out tire levels 

• 口 ㈣： r 

over B, and the second level A. 


Try and wor 

As an example, if you’re given a 
probably need the first level to c 

2 - Till i n what you \now 

.f r^rnhabilities put them onto the tree m 
If you，re given a series of probabiim 

the relevant position. 

3. Kemember that eacK set of brandies sums to 1 

If you add together the P robablhtieS : 二二 : 5 1. 
that fork off from a common point, the sum snou 4 

Remember that P(A) = 1 - P( A ')- 

4. Kemeinber your formula 

You should be able to find most other probabilities by usmg 

P(A I B) = P(A n B) 

p(bT 
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Protatilit^ Magnets Solution 

Duncan's Donuts are looking into the probabilities of their customers 
buying Donuts and Coffee. They drew up a probability tree to show the 
probabilities, but in a sudden gust of wind they all fell off. Your task is to pin 
the probabilities back on the tree. Here are some clues to help you. 


P(Donuts) = 3/4 


P(Coffee I Donuts 1 ) = 1/3 


P(Donuts fl Coffee) = 9/20 



TV>CSC rr^VAst add 
I as 
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We haven’t quite finished with Duncan's Donuts! Now that you’ve completed 
the probability tree, you need to use it to work out some probabilities. 


1. P(Donuts') 


2. P(Donuts' Pi Coffee) 


3. 「 (Coffee 1 1 Donuts) 


4. P(Coffee) ttoy； ways av-c 

■thcvc of dc^P-Pcc? 

(you d 扣 jet dor-rcc v/ith ov- 
wi 七 hou 七 do^u-ts.) 


vwb- 十， c ° Lr 二 


5. P(Donuts I Coffee) 
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exercise solution 



Your job was to use the completed probability tree to work out some probabilities. 


1. P(Donuts') 

1 / 午 


lAfe wwd *tWis oW 如七代 c . 

P(pov>u*bs ) 二 七 
so p(po^-b') must fee I / 十 . 


2. P(Donuts' Pi Coffee) 
l/IZ 

t -this by multiply — 匕听仏饮 

p (〒 cc I m “ 

just 上 d po y) ^ j/ff ah(J |ookj 

a} n hc . ^ CC； P(C 。 俗 c I Doris') - \/l^ 

/Wultiplyihg these gives l/IZ. 


3. P(Coffee' | Donuts) 




Uy\ \rcad this 

o*f-P tKc tv-cc. 


4. P(Coffee) 

TV»s ^obab.lity IS br^ so dor!i 讓以， 
出‘七 y 七 1 七. 

Toati P(C^cc), V/C ^cd {o add {py^tr 
P(Co^ee n pemuts) and P(Co-f^cc Pi Vomb> J 
TW • 卞 s us l/li + V^O ^ 0/1^. 


5. P(Donuts I Coffee) 

rvvi 


Voull o^Y be able to do -tw.s ^ YOU P(CD. 

p(Po,uls I CM - P(Po-ts n C^ee) / P(CD. 

TW,s yves us d/ZO) / ⑷ / ⑹二 2-W. 
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I still don’t get the difference 
betweenP(ADB)andP(A|B). 

P(A fl B) is the probability of getting 
both A and B. With this probability, you can 
make no assumptions about whether one 
of the events has already occurred. You 
have to find the probability of both events 
happening without making any assumptions. 

P(A I B) is the probability of event A given 
event B. In other words, you make the 
assumption that event B has occurred, and 
you work out the probability of getting A 
under this assumption. 

Q/ So does that mean that P(A | B) is 
just the same as P(A)? 

No, they refer to different probabilities. 
When you calculate P(A | B), you have to 
assume that event B has already happened. 
When you work out P(A), you can make no 
such assumption. 


V?+aL S+a+fstfcs 

Cohdi 七 ions 

p(/\ I b ) 二 m n B) 

P(B) 



Is P(A I B) the same as P(B | A)? 
They look similar. 

It’s quite a common mistake, but they 
are very different probabilities. P(A | B) 
is the probability of getting event A given 
event B has already happened. P(B | A) 
is the probability of getting event B given 
event A occurred. You’re actually finding 
the probability of a different event under a 
different set of assumptions. 

Are probability trees better than 
Venn diagrams? 

Both diagrams give you a way of 
visualizing probabilities, and both have their 
uses. Venn diagrams are useful for showing 
basic probabilities and relationships, while 
probability trees are useful if you’re working 
with conditional probabilities. It all depends 
what type of problem you need to solve. 


Is there a limit to how many sets of 
branches you can have on a probability 
tree? 

In theory there's no limit. In practice 
you may find that a very large probability 
tree can become unwieldy, but you may still 
find it easier to draw a large probability tree 
than work through complex probabilities 
without it. 

If A and B are mutually exclusive, 
what is P(A I B)? 

If A and B are mutually exclusive, then 
P(A fl B) = 0 and P(A | B) = 0. This makes 
sense because if A and B are mutually 
exclusive, it’s impossible for both events 
to occur. If we assume that event B has 
occurred, then it’s impossible for event A to 
happen, so P(A | B) = 0. 
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a new conditional probability 


Pad luck! 


You placed a bet that the ball would land in an even pocket 
given we’ve been told it’s black. Unfortunately, the ball landed 
in pocket 17, so you lose a few more chips. 

Maybe we can win some chips back with another bet. This 
time, the croupier says that the ball has landed in an even 
pocket. What’s the probability that the pocket is also black? 




But thafs a similar problem to the 
one we had before. Do you mean we have 
to draw another probability tree and work 
out a whole new set of probabilities? CarVt 
we use the one we had before? 



O 



We can reuse the probability calculations we 
already did. 

Our previous task was to figure out P(Even | Black), and we 
can use the probabilities we found solving that problem to 
calculate P(Black | Even). Here’s the probability tree we used 
before: 



8/18 


Odd 


18/38 



Black 


10/18 


Even 


10/18 


Red 


8/18 


Odd 


Even 


2/38 


Green 
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Wc caw fiwd PtPlack I Evew) using the probabilities we already have 


So how do we find P(Black | Even)? There’s still a way of calculating 
this using the probabilities we already have even if it’s not immediately 
obvious from the probability tree. All we have to do is look at the 
probabilities we already have, and use these to somehow calculate the 
probabilities we don’t yet know. 

Let’s start off by looking at the overall probability we need to find, 
P(Black I Even). 


Using the formula for finding conditional probabilities, we have 


Use tke 
protatilities 


P(Black I Even) = P(Black Pi Even) 


P(Even) 


If we can find what the probabilities of P(Black Pi Even) and P(Even) are, 
we’ll be able to use these in the formula to calculate P(Black | Even). All 
we need is some mechanism for finding these probabilities. 


you ttave to 
calculate tke 


protatilities 
you neect 


Sound difficult? Don’t worry, we’ll guide you through how to do it. 


Step 1: Finding P(Plack n Evew) 

Let’s start off with the first part of the formula, P(Black Pi Even). 


Take a look at the probability tree on the previous page. How can 
you use it to find P(Black Pi Even)? 

\ 

、 Wiht ?(d\ack n Evch) - P(Evch n d\aM 



rpen your pencil 
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sharpen solution 



en your pencil 
Solution 


Take a look at the probability tree opposite. How can you use it to 
find P(Black H Even)? 


You -fmd P(Bla 匕 k n by multiplymg *toytlic\r P(Bla^k) P(Evc^ | Black). This jives us 

POladk n Evc^) — P(Bladk) x. P(Evc^ | Bladk) 

IO_ 

=1 10 

ze 

n 


So where does this get us? 


We want to find the probability P(Black | Even). We can do 
this by evaluating 

P(Black I Even) = P(Black Pi Even) 


P(Even) 


So far we’ve only looked at the first part of the formula, 
P(Black Pi Even), and you’ve seen that you can calculate 
this using 



This gives us 


P(Black n Even) = P(Black) x P(Even | Black) 


P(Even) 


So how do we find the next part of the formula, P(Even)? 






..so y/c cby\ subs-tiiutc P(Bladk) % P(Evc^ | Blatk) 
(or P(Blatk n Evch) m ouv ovi^'mdl -Povmula. 


Take another look at the probability tree on page 168. How do you think we 
can use it to find P(Even)? 
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Step Z: Finding P(Evew) 

The next step is to find the probability of the ball landing in an even 
pocket, P(Even). We can find this by considering all the ways in which 
this could happen. 

A ball can land in an even pocket by landing in either a pocket that’s 
both black and even, or in a pocket that’s both red and even. These are 
all the possible ways in which a ball can land in an even pocket. 

This means that we find P(Even) by adding together P(Black fl Even) 
and P(Red Pi Even). In other words, we add the probability of the 
pocket being both black and even to the probability of it being both red 
and even. The relevant branches are highlighted on the probability tree. 


8/18 


- - 


Odd 





tt 


38 
= 9 
19 
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generalizing reverse conditional probabilities 


Step 3: Finding PtPlack I Even) 

Can you remember our original problem? We wanted to find 
P(Black I Even) where 


P(Black I Even) = P(Black Pi Even) 


P(Even) 


We started off by finding an expression for P(Black fl Even) 


P(Black fl Even) = P(Black) x P(Even | Black) 


After that we moved on to finding an expression for P(Even), and 
found that 



P(Even) = P(Black) x P(Even | Black) + P(Red) x P(Even | Red) 


Putting these together means that we can calculate P(Black | Even) 
using probabilities from the probability tree 




P(Black I Even) = P(Black Pi Even) 

P(Even) 


P(Black) x P(Even | Black) 

P(Black) x P(Even | Black) + P(Red) x P(Even | Red) 



19 19 


= 5 x 


9 


taWlaW 

cavVicv-, so 


uVb. 



9 


This means that we now have a way of finding new conditional 
probabilities using probabilities we already know — something that can 
help with more complicated probability problems. 

Let’s look at how this works in general. 
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These results can be generalized to other problems 

Imagine you have a probability tree showing events A and B like 
this, and assume you know the probability on each of the branches. 



B 


B 1 


B 


B 1 


Now imagine you want to find P(A | B), and the information shown 
on the branches above is all the information that you have. How 
can you use the probabilities you have to work out P(A | B)? 

We can start with the formula we had before: 

P(A 1 B) = P(A n B) 

P(B) P(A H 

Now we can find P(A Pi B) using the probabilities we have on the 
probability tree. In other words, we can calculate P(A Pi B) using 

P(A n B) = P(A) x P(B I A) 


But how do we find P(B)? 





Take a good look at the probability tree. How would you use it to find P(B)? 
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law of total probability 


Use the Law of Total Probability to find P(P) 

To find P(B), we use the same process that we used to find P(Even) earlier; we 
need to add together the probabilities of all the different ways in which the 
event we want can possibly happen. 

There are two ways in which even B can occur: either with event A，or without 
it. This means that we can find P(B) using: 

P(B) = P(A n B) + P(A' n B) 

We can rewrite this in terms of the probabilities we already know from the 
probability tree. This means that we can use: A 

P(A n B) = P(A) X P(B I A) 

P(A' Pi B) = P(A') x P(B I A 1 ) J 

This gives us: / 

P(B) = P(A) x P(B I A) + P(A_) x P(B | A 1 ) 

This is sometimes known as the Law of Total Probability : as it gives 
a way of finding the total probability of a particular event based on 
conditional probabilities. 



Now that we have expressions for P(A Pi B) and P(B), we can put 
them together to come up with an expression for P(A | B). 
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Introducing Payes" Theorem 

We started off by wanting to find P(A | B) based on probabilities 
we already know from a probability tree. We already know P(A), 
and we also know P(B | A) and P(B | A 1 ). What we need is a 
general expression for finding conditional probabilities that are 
the reverse of what we already know, in other words P(A | B). 



We started off with: 


P(A I B) = P(A H B) 

P(B) 


■this 




On page 127, we found P(A fl B) = P(A) x P(B | A). And on the 
previous page, we discovered P(B) = P(A) x P(B | A) + P(A') x P(B 
A’). 

If we substitute these into the formula, we get: 


P(A I B) 


P(A) x P(B I A) 


雜 X 


Bayes 5 Theorem is 
one of the most 
difficult aspects of 
probability. 


Don’t worry if it looks complicated — this 
is as tough as it’s going to get. And even 
though the formula is tricky, visualizing the 
problem can help. 



P(A) x P(B I A) + P(A_) x P(B I A 1 ) 


...becomes tWis 


This is called Bayes 3 Theorem. It gives you a means of finding reverse 
conditional probabilities, which is really useful if you don’t know every 
probability up front. 


Wcvc s A- To -fihd 

P(A I B)... 


dw»dc 



P(B' I A<T — B 1 
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long exercise 



t°nt E%endSe 


The Manic Mango games company is testing two brand-new games. They’ve asked a group of 
volunteers to choose the game they most want to play, and then tell them how satisfied they 
were with game play afterwards. 


80 percent of the volunteers chose Game 1, and 20 percent chose Game 2. Out of the Game 
1 players, 60 percent enjoyed the game and 40 percent didn't. For Game 2, 70 percent of the 
players enjoyed the game and 30 percent didn't. 

Your first task is to fill in the probability tree for this scenario. 
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Manic Mango selects one of the volunteers at random to ask if she enjoyed playing the game, and she 
says she did. Given that the volunteer enjoyed playing the game, what’s the probability that she played 
game 2? Use Bayes’ Theorem. 

C 

Hih 七.七 he ^Mrobabil.rty of someone dhoosiiD^ 2 - dhd s3*tis-(*icd? 

l/Vha-ts the pmobabiliiy o( someone bc'mj satisfied ovcvall? Oy\U you’ve *fouhd 
these, you use Bayes Thcov-cm -to obtain -the \rijh-t artsv/ev-. 
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long exercise solution 



Jtortf E%eRciSe 
Sotiitvort 


The Manic Mango games company is testing two brand-new games. They’ve asked a group of 
volunteers to choose the game they most want to play, and then tell them how satisfied they 
were with game play afterwards. 


80 percent of the volunteers chose Game 1, and 20 percent chose Game 2. Out of the Game 
1 players, 60 percent enjoyed the game and 40 percent didn't. For Game 2, 70 percent of the 
players enjoyed the game and 30 percent didn't. 

Your first task is to fill in the probability tree for this scenario. 


P，robabi ' it y a pl^y^ cadh 


Wt also Uon, — aW ••切 d a ㈣戊㈣ 

satis-P*»cd ov- dissatisfied 七 hr/ 









Pissatis-ficdi 




Pissatis-ficd 
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Manic Mango selects one of the volunteers at random to ask if she enjoyed playing the game, and she says she 
did. Given that the volunteer enjoyed playing the game, what’s the probability that she played game 2? Use Bayes’ 
Theorem. 


Wc r\ttd *to use Bayes’ Theorem *to -f md Z | Sa*tis-ficd). This wc r\ttd *to use 

Z I Sa*tis-ficd) — Z) P(Sa*tis-ficd | Z) 

P(^a^c Z) P(Sa*tis-ficd | Z) + I) P(Sa*tis-ficd | I) 

Lrt’s s*ta\rt wi*th P(^amc Z) P(Sa*tis-ficd | Z) 


IVc^vc been *told P(^amc Z) — O.Z a^d P(Sa*tis-ficd | Z) — 0.7. This *tha*t 

Z) P(Sa*tis-ficd | Z) — O.Z >c 0.7 

二 OJ 午 


The v/e v\ttd *to -fmd is I) P(Satis-ficd | I). )Nt\t beeh *told 七 

P(Sa*tis-ficd I I) — Oa^d also -tha-t P(^amc I) — 0.0. This mea^s 

P(^amc I) P(Sa*tis-ficd | I) — 0.0 0.^> 

- 0.午《 


Subs*ti*tu*tm 3 *tiVis *m*to -fov-mula -fo\r Bayes’ Thco\rcw\ jives us 

P(^amc Z I Satisf ied ) 二 P(^amc Z) P(Sa*tis-ficd | Z) 

P(^a^c Z) P(Sa*tis-ficd | Z) + P(^amc I) P(Sa*tis-ficd | I) 
二 0.1 午 
0.1 午 + o.^e 
二 0.1 午 

0 丄 Z 

二 0.11t> 
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vital statistics 


^ Vf-faL Sta+fstfcs - 

^-3>w of To'tcll Pvobabili'ty 

l*P you have two Whts A ahd B, -thch 

p(b ) 二 p(b n A) + ?(d n a 1 ) 

二 P ㈧ P(B I A) + P ⑽ P(B I /V) 

The Uw <>P Toia\ Probability is the dchommaW of Bayes Theovem. 
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Dumb Quest! 


9ns 


So when would I use Bayes’ 
Theorem? 

Use it when you want to find 
conditional probabilities that are in the 
opposite order of what you’ve been given. 

Do 丨 have to draw a probability 

tree? 

You can either use Bayes' Theorem 
right away, or you can use a probability 
tree to help you. Using Bayes' Theorem 
is quicker, but you need to make sure you 
keep track of your probabilities. Using a 
tree is useful if you can’t remember Bayes’ 
Theorem. It will give you the same result, 
and it can keep you from losing track of 
which probability belongs to which event. 


When we calculated P(Black | Even) 
in the roulette wheel problem, we didn’t 
include any probabilities for the ball 
landing in a green pocket. Did we make a 
mistake? 

No, we didn't. The only green pockets 
on the roulette board are 0 and 00, and we 
don’t classify these as even. This means that 
P(Even I Green) is 0; therefore, it has no 
effect on the calculation. 

C^: The probability P(Black|Even) turns 
out to be the same as P(Even|Black): 
they’re both 5/9. Is that always the case? 

A- 

True, it happens here that 
P(Black I Even) and P(Even | Black) have the 
same value, but that’s not necessarily true for 
other scenarios. 


If you have two events, A and B, you can’t 
assume that P(A IB) and P(B I A) will 
give you the same results. They are two 
separate probabilities, and making this sort of 
assumption could actually cost you valuable 
points in a statistics exam. You need to use 
Bayes’ Theorem to make sure you end up with 
the right result. 


How useful is Bayes' Theorem in real 
life? 

It’s actually pretty useful. For example, 
it can be used in computing as a way of 
filtering emails and detecting which ones 
are likely to be junk. It’s sometimes used in 
medical trials too. 


Wc have a wiwwcr! 

Congratulations, this time the ball landed on 10, a pocket 
that’s both black and even. You’ve won back some chips. 
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dependent events 


Ifs time for one last bet 

Before you leave the roulette table, the croupier has 
offered you a great deal for your final bet, triple or 
nothing. If you bet that the ball lands in a black pocket 
twice in a row, you could win back all of your chips. 

Here’s the probability tree. Notice that the probabilities 
for landing on two black pockets in a row are a bit 
different than they were in our probability tree on page 
166, where we were trying to calculate the likelihood 
of getting an even pocket given that we knew the pocket 
was black. 



O 


Black 




Black 

Red 

Green 



Black 

Red 

Green 



Black 

Red 

Green 
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calculating probabilities 


If events affect each other, they arc depewdewt 


The probability of getting black followed by black is a slightly 
different problem from the probability of getting an even 
pocket given we already know it’s black. Take a look at the 
equation for this probability: 

P(Even I Black) = 10/18 = 0.556 ^ - -- 

For P(Even | Black), the probability of getting an even pocket 
is affected by the event of getting a black. We already know 
that the ball has landed in a black pocket, so we use this 
knowledge to work out the probability. We look at how many 
of the pockets are even out of all the black pockets. 



If we didn’t know that the ball had landed on a black pocket, 
the probability would be different. To work out P(Even), we 
look at how many pockets are even out of all the pockets 


P(Even) = 18/38 = 0.474 

P(Even I Black) gives a different result from P(Even). In other 
words, the knowledge we have that the pocket is black changes 
the probability. These two events are said to be dependent. 


^ 之 p r u 桃 




In general terms, events A and B are said to be dependent if 
P(A I B) is different from P(A). It’s a way of saying that the 
probabilities of A and B are affected by each other. 





Look at the probability tree on the previous page 
again. What do you notice about the sets of 
branches? Are the events for getting a black in the 
first game and getting a black in the second game 
dependent? Why? 
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independent events 


If events do wot affect each other, they arc Iwdcpcwdcwt 


Not all events are dependent. Sometimes events remain completely 
unaffected by each other, and the probability of an event occurring 
remains the same irrespective of whether the other event happens or 
not. As an example, take a look at the probabilities of P(Black) and 
P(Black I Black). What do you notice? 


P( _ = 刪咖 4 e 

P(Black I Black) = 18/38 = 0.474 


These two probabilities have the same value. In other words, the 
event of getting a black pocket in this game has no bearing on the 
probability of getting a black pocket in the next game. These events 

are independent. 


Independent events aren’t affected by each other. They don’t influence 
each other’s probabilities in any way at all. If one event occurs, the 
probability of the other occurring remains exactly the same. 



Well, you make no 
difference to me either. I 
don’t care whether youre 
there or not. I guess this 
means were independent 


If events A and B are independent, then the probability of event A is 
unaffected by event B. In other words 


P(A I B) = P(A) 


for independent events. 

We can also use this as a test for independence. If you have two events 
A and B where P(A | B) = P(A), then the events A and B must be 
independent. 
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calculating probabilities 


More oh calculating probability for independent events 


It’s easier to work out other probabilities for independent 
events too, for example P(A Pi B). 

We already know that 

P(A I B) = P(A fl B) 

P(B) 

If A and B are independent, P(A | B) is the same as P(A). 
This means that 

P(A) = P(A n B) 

P(B) 

or 

P(A n B) = P(A) x P(B) 



If A and B are 
mutually exclusive, 
they can’t be 
independent, and 
if A and B are 


independent, they can’t be 
mutually exclusive. 


If A and B are mutually exclusive, 
then if event A occurs, event 
B cannot This means that the 
outcome of A affects the outcome of 
B, and so they’re dependent. 


Similarly if A and B are independent, 
they can’t be mutually exclusive. 


for independent events. In other words, if two events are 
independent, then you can work out the probability of 
getting both events A and B by multiplying their individual 
probabilities together. 



It’s time to calculate another probability. What’s the probability of 
the ball landing in a black pocket twice in a row? 
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sharpen solution 



rpen your pencil 
Solution 


It’s time to calculate another probability. What’s the probability of 
the ball landing in a black pocket twice in a row? 


IVlc Y\ttd *to -f md P(Bla 匕 k m jamc I fl Bla 匕 k m z). As 七 he cvci^*ts a\rc *tiic \rcsul*t is 

ie/ie % le/ze - 

— 0.22A (b> dermal places) 



What’s the difference between 
being independent and being mutually 
exclusive? 

Imagine you have two events, A and B. 

If A and B are mutually exclusive, then if 
event A happens, B cannot. Also, if event B 
happens, then A cannot. In other words, it's 
impossible for both events to occur. 

If A and B are independent, then the 
outcome of A has no effect on the outcome 
of B, and the outcome of B has no effect on 
the outcome of A. Their respective outcomes 
have no effect on each other. 

Do both events have to be 
independent? Can one event be 
independent and the other dependent? 

No. The two events are independent 
of each other, so you can’t have two events 
where one is dependent and the other one is 
independent. 


Are all games on a roulette wheel 
independent? Why? 

Yes, they are. Separate spins of the 
roulette wheel do not influence each other. 
In each game, the probabilities of the ball 
landing on a red, black, or green remain the 
same. 



V?+aL S+a+lstte 


Ihdcfchdlchdc 


l-f -two events /\ ar\d B arc 


You’ve shown how a probability 
tree can demonstrate independent events. 
How do I use a Venn diagram to tell if 
events are independent? 

A Venn diagram really isn’t the 
best way of showing dependence. Venn 
diagrams are great if you need to examine 
intersections and show mutually exclusive 
events. They’re not great for showing 
independence though. 


P(A I B ) 二 P(A) 

|-f *this holds -for 叫 *bwo 
cver\*ts, *thc cvcrrts rwus*t 
be mdcpchdch*t- Also 

?(h n b) 二 p(a) 乂 p(b) 
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calculating probabilities 


The Case of the Two Classes 


The Head First Health Club prides itself on its ability to find a class for 
everyone. As a result, it is extremely popular with both young and old. 

The Health Club is wondering how best to market its new yoga class, 
and the Head of Marketing wonders if someone who goes swimming 
is more likely to go to a yoga class. “Maybe we could offer some sort of 
discount to the swimmers to get them to try out yoga.” 

The GEO disagrees. “I think you’re wrong,” he says. “I think 
W that people who go swimming and people who go to yoga are 
M independent. I don’t think people who go swimming are any 
more likely to do yoga than anyone else.” 

They ask a group of 96 people whether they go to the swimming 
or yoga classes. Out of these 96 people, 32 go to yoga and 72 go 
swimming. 24 people are exceptionally eager and go to both. 

So zvho’s right? Are the yoga and sivimming classes 
dependent or independent? 
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fireside chat: dependent versus independent 


Fireside Chats 



Tonight’s talk： Dependent and Independent discuss their 
differences 


Dependent: 

Independent, glad you could show up. I’ve been 
wanting to catch up with you for some time. 

Well, I hear you keep getting fledgling statisticians 
into trouble. They’re doing fine until you show up, 
and then, whoa, wrong probabilities all over the 
place! That fl guy has a particularly poor opinion 
of you. 


It’s that simplistic attitude of yours that gets people 
into trouble. They think, u Hey, that Independent 
guy looks easy. I’ll just use him for this probability.” 
The next thing you know, Pi has his probabilities all 
in a twist. That’s just not the right way of dealing 
with dependent events. 


You don’t understand the seriousness of the 
situation. If people use your way of calculating H’s 
probability, and the events are dependent, they’re 
guaranteed to get the wrong answer. That’s just not 
good enough. For dependent events, you only 
get the right answer if you take that | guy into 
account — he’s a given. 


Independent: 


Really, Dependent? How come? 


I’m a little hurt that H’s been saying bad things 
about me; I thought I made life easy for him. 
You want to work out the probability of getting 
two independent events? Easy! Just multiply the 
probabilities for the two events together and job 
done. 


You’re blowing this all out of proportion. Even if 
people do decide to use me instead of you, I don’t 
see that it can make all that much difference. 


I can’t say I pay all that much attention to him. 
With independent events, probabilities just turn out 
the same. 
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calculating probabilities 


Dependent: Independent: 

You’re doing it again; you’re oversimplifying things. 

Well, I’ve had enough. I think that people need to 
think of me first instead of you; that would sort out 
all of these problems. 

Yeah? Like how? 

By really thinking through whether events are 
dependent or not. Let me give you an example. 

Suppose you have a deck of 52 cards, and thirteen 
of them are diamonds. Imagine you choose a card 
at random and it’s a diamond. What would be the 
probability of that happening? 

That’s easy. It’s 13/52, or 1/4. 

What if you pick out a second card? What’s the 
probability of pulling out a second diamond? 

It’s the same isn’t it? 1/4. 

No! The events are dependent. You can no longer 
say there are 13 diamonds in a pack of 52 cards. 

You’ve just removed one diamond, so there are 
12 diamonds left out of 51 cards. The probability 
drops to 12/51, or 4/17. 

Not fair, I assumed you put the first card back! 

That would have meant the probability of getting a 
diamond would have been the same as before, and I 
would have been right. The events would have been 
independent. 

But they weren’t. When people think about you 
first, it leads them towards making all sorts of 
inappropriate assumptions. No wonder fl gets so 
messed up. 

Well, thanks for the chat, Dependent, I’m glad we 
had a chance to sort things out. 

Think nothing of it. Just make sure you think things 
through a bit more carefully next time. 
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five minute mystery solution 


Solved: The Case of the Two Glasses 


Are the yoga and s'wimming classes dependent or 
independent? 

The CEO’s right — the classes are independent. 

Here’s how he knows. 

32 people out of 96 go to yoga classes, so 

P(Yoga) = 1/3 

72 people go swimming, so 

P(Swimming) = 3/4 

24 people go to both classes, so 

P(Yoga fl Swimming) = 1/4 

So how do we know the classes are independent? Let’s multiply 
together P(Yoga) and P(Swimming) and see what we get. 



P(Yoga) x P(Swimming) = 1/3 x 3/4 


=1/4 

As this is the same as P(Yoga Pi Swimming), we know that the 
classes are independent. 
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calculating probabilities 


Here are a bunch of situations and events. Your task is to say which of 
these are dependent, and which are independent. 

Pependent IndependeHt 

Throwing a coin and getting heads twice 

in a row. - - 


Removing socks from a drawer until you 
find a matching pair. 


Choosing chocolates at random from a box 
and picking dark chocolates twice in a row. 


Choosing a card from a deck of cards, and 
then choosing another one. 


Choosing a card from a deck of cards, 
putting the card back in the deck, and then 
choosing another one. 


The event of getting rain given it’s a 
Thursday. 
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dependent or independent solution 




Here are a bunch of situations and events. Your task was to say which 
of these are dependent, and which are independent. 


TKc second tom 七 Wow | 汍’七 
a^tcitd by i\\t “st 

、 ，， 

Throwing a coin and getting heads twice 
in a row. 





Removing socks from a drawer until you 
find a matching pair. 


Pepehdent Independent 




Choosing chocolates at random from a box 
and picking dark chocolates twice in a row. 



Choosing a card from a deck of cards, and 
then choosing another one. 



Choosing a card from a deck of cards, 
putting the card back in the deck, and then 
choosing another one. 



The event of getting rain given it’s a 
Thursday. 



It’s ⑽ mo\rc o\r less likely "to Wm jusi 
because i-t's Thursday, so these two cvc^-ts 

av-c mdepe 灼 derrt. 
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calculating probabilities 


Wi 讓 r! Wi 眶 r! 


On both spins of the wheel, the ball landed on 30, a red 
square, and you doubled your winnings. 

You’ve learned a lot about probability over at Fat Dan’s 
roulette table, and you’ll find this knowledge will come in 
handy for what’s ahead at the casino. It’s a pity you didn’t 
win enough chips to take any home with you, though. 

CNo*tc -fvorw Fat 

Tiia 七 ’s a v-clic-f .3 


Ifs great that we know our chances 
of winning all these different bets, but 
dorVt we need to know more than just 
probability to make smart bets? 





Besides the chances of winning, you 
also need to know how much you 
stand to win in order to decide if the 
bet is worth the risk. 

Betting on an event that has a very low probability 
may be worth it if the payoff is high enough to 
compensate you for the risk. In the next chapter, 
we’ll look at how to factor these payoffs into our 
probability calculations to help us make more 
informed betting decisions. 
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probability puzzle 



The ^sent-Minded Diners 

Three absent-minded friends decide to go out for a meal, but 
they forget where they’re going to meet. Fred decides to throw a 
coin. If it lands heads, he’ll go to the diner; tails, and he’ll go to the 
Italian restaurant. George throws a coin, too; heads, it’s the Italian 
restaurant; tails, it’s the diner. Ron decides he’ll just go to the 
Italian restaurant because he likes the food. 

What’s the probability all three friends meet? What’s the 
probability one of them eats alone? 
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calculating probabilities 




Here are some more roulette probabilities for you to work out. 


1. The probability of the ball having landed on the number 17 given the pocket is black. 


2. The probability of the ball landing on pocket number 22 twice in a row. 


3. The probability of the ball having landed in a pocket with a number greater than 4 given that 
it’s red. 


4. The probability of the ball landing in pockets 1 ， 2, 3, or 4. 
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puzzle solution 



The 咖邮 - Minded Diners Solution 

Three absent-minded friends decide to go out for a meal, but 
they forget where they’re going to meet. Fred decides to throw a 
coin. If it lands heads, he’ll go to the diner; tails, and he’ll go to the 
Italian restaurant. George throws a coin, too; heads, it’s the Italian 
restaurant; tails, it’s the diner. Ron decides he’ll just go to the 
Italian restaurant because he likes the food. 


What’s the probability all three friends meet? What’s the 
probability one of them eats alone? 






l-f dll -fvici^ds mert, i 七 mus*t be a*t 
\rcs*tau\ra^t IVlc Y\ttd *to -f md 


P(Ron Italia^ n Fv-cd Italia^ n Italia^) 

二 I x O .弓 x O .弓二 0.2 •弓 



D'mcv- 





D'mcv- 




I pc\rso^ cats alo^c i-f F\rcd and ^co\rjc Jo *to D*mc\r. 
F\rcd Joes *to D*mc\r y/hilc ^Co\rjc Joes *to |-bdlid^ 
\rcs*tau\ra^*t, o\r ^co\rjc Joes *to D*mc\r av\d Fred JC*b 

(O.^ % O.^) + (O.^ % O.^) + (0.^ % O.^) — 0.1 弓 
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calculating probabilities 



Here are some more roulette probabilities for you to work out. 


1. The probability of the ball having landed on the number 17 given the pocket is black. 


Thcv-c av-c 1^ bla^k podkrb, one o-f -them is ^umbcv-cd 11- 
P(n I Bladk) — I/I® — ({jo Z decimal places) 


2. The probability of the ball landing on pocket number 22 twice in a row. 

I/Vlc Y\ttd *to -f md P(ZZ Pi ZZ). As these events a\rc mdcp ⑶ dcirrt, -this is 
c«\ual *to P(ZZ) x. P(ZZ). The probability of a 2-Z is 1/^0, so 

P(ZZ Pi ZZ) — 1/^0 x. l/ZQ — I/I 午午午二 O.OOO^ (b> ^ dedimdl pladcs) 


3. The probability of the ball having landed in a pocket with a number greater than 4 given that 
it’s red. 

P(/\bove 午 I Red) 二丨 -P (午 o\r below | Red) 

Thc\rc a\rc Z \rcd numbers below 午 ， so this ^ives us 
I 一 (l/l« + l/ie) - 0/^J - Q^°[ (b> Z decimal places) 


4. The probability of the ball landing in pockets 1,2, 3, or 4. 

The pv-obabili*ty eddh podkc*t is l/i^, so p\robabili*ty -this cvci^*t 
is 午 >c l/?>G — 午 / 祕二 O lO^ (*to 1> decimal places) 
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5 Using discrete probability distributions 




♦Manage Your Expectations ♦ 

考 



OK, so falling out the tree was 
unexpected, but you have to take 
a long-term view of these things. 




Unlikely events happen, but what are the consequences? 

So far we’ve looked at how probabilities tell you how likely certain events are. What 
probability afoes/iY tell you is the overall impact of these events, and what it means 
to you. Sure, you’ll sometimes make it big on the roulette table, but is it really worth it 
with all the money you lose in the meantime? In this chapter, we'll show you how you 
can use probability to predict long-term outcomes, and also measure the certainty 
of these predictions. 


this is a new chapter 
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slot machine payouts 


Pack at Fat Paw's Casino 

Have you ever felt mesmerized by the 
flashing lights of a slot machine? Well, 
you’re in luck. At Fat Dan’s Casino, there’s 
a full row of shiny slot machines just waiting 
to be played. Let’s play one of them, which 
costs % 1 per game (pull of the lever). Who 
knows, maybe you’ll hit jackpot! 

The slot machine has three windows, and 
if all three windows line up in the right way, 
the cash will come cascading out. 


聲 














令 1 lor eack game 

= 令 20 

寧 (any order) : 












The amount of money 
you can win looks tempting, but 
rd like to know the probability of 
getting any of these combinations 
before playing. 


This sounds like something we can calculate. 
Here are the probabilities of a particular image 
appearing in a particular window: 


$ ( 

Cherry 、 


Lemon 

Other 

0.1 \ 



0.2 

0.5 


The three windows are independent of each other, 
which means that the image that appears in one of 
the windows has no effect on the images that appear 
in any of the others. 


The p\robabiliiy of a dhevry 
appcair'mj "m this window is 
0 . 1 . 
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using discrete probability distributions 




e a Iook at tire poster for tire slot macliine on tire 
facing pa^e. Your job is to play like you’re Ae 

^a^ibler and wo 成 out lie probability of getting 
each combination on the poster. 喊 at’S fiie 
probability of not winning anydiin^? 


protatility ol ^ ^ ^ 

protatility oi ^ ^ ^ (any order) 

| | • k p i 

probability ot 

j ^ f v v 

protatility oi 获获 ^ 

protatility oi winning notking 
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be the gambler solution 


BE . s©lug©n 

Take a Iook at tire poster for tire slot macliine on tiie 
facing pa^e. Your job is to play like you’re Ae 

^a^ibler and worl^ out Ae probability of getting 
each combination on the poster. 喊 at’S fire 
probability of not winning anydiin^? 



protatility oi g g g 


P(/, /, /) 二 P(/) * P(/) ， P(/) 

— 0 \ % O l % O.l 
二 O.OOl 


The probability o*f a 

dollav sigh appcavihg ih 
3 w'rndoy/ is O./ 


protatility oi 


[ 




POcmo^ kr»\ 。 灼， lew\ 。灼）二 P0crwOir\) >C P(lcrwO^) X. P(lcrwO^) 

A \tv^ 0 Y\ \r\ a v/mdoy/ QX 卞 OX x. O.Z 

is *mdcfcir>dc^*t cm* oy>CS 一 

appcav-*nr>^ 'm *tKc o*tV>cv- *t>wo 一 

y/'mdov/s, so you multiply 七 k 

-tKv-cc pv-obabilitics 


protatility of ^ ^ 參 (any order) 

Thc\rc 3\rc *tiiv-cc v/dys o-f yttmj *this ： 

?(j, j, ^hc\r\ry) + ?(j, dhc\r\ry, f) + P^hc\r\ry, j, f) 

- (o.P >c az) + (o.l 1 % ox) + (o.i z % o.z) 

二 o.oot> 


protatility oi ^ ^ ^ 

P ( 匕 he\r\ry, 匕 hc\nry, dhc\r\ry) — P(dhc\r\ry) X. P(^hc\r\ry) % P(^hc\r\ry) 

- O.l % O.l % ox 

二 0.000 




protatility oi winning notliing 

This y/e jc*t Y\OY\t o( *tiiC domb'ma*tio^s. \T ， 

POosmj) — I — P(f, f, f) — P(f, f, dhc\r\ry (a^y ovdcv)) — P(dhc\r\ry, ^hc\r\ry, ^hc\r\ry) — POcmo^ lemoh lcrwoir\) 

二丨一 0.001 - 0.00t> - 0.000 — 0.000 p 

n Tiicsc av-c -fouv fvobabilrby values v/c 

二 0.°fn ^al^uldied above. 
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using discrete probability distributions 


Wc caw compose a probability distribution for the slot machine 


Here are the probabilities of the different winning combinations 
on the slot machine. 





Combination 

None 

Lemons 

Cherries 

Dollars/cherry 

Dollars 

Probability 

0.977 

0.008 

0.008 

0.006 

0.001 


This looks useful, but I wonder if we can take it one 
step further. We’ve found the probabilities of getting 
each of the winning combinations, but what were 
really interested in is how much well win or lose. 


We don’t just want to know the probability of 
winning，we want to know how much we stand to 
win. 

The probabilities are currently written in terms of combinations of 
symbols, which makes it hard to see at a glance what out gain will be. 

We don’t have to write them like this though. Instead of writing the 
probabilities in terms of slot machine images, we can write them in 
terms of how much we win or lose on each game. All we need to do 
is take the amount we’ll win for each combination, and subtract the 
amount we’ve paid for the game. 



Combination 

None 

Gain < 

■$1 

Probability /| 

0.977 


Wc lose fl i-p wc hit 9 

dombiha-tioh. 









$14 


I 0.QQ8 


0.006 


TV^csc avc samC . 


Dollars 


$19 



0.001 


0u\r jam -fo\r W\y\y\'\y\^ 

donr\bmd'bior>- the payof-P m'mus -the 
v/c paid "to play. 


s 


The table gives us the probability distribution of the 

winnings, a set of the probabilities for every possible gain 
or loss for our slot machine. 
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probability distributions in depth 


JO 


Probability D_Wti 祕 Up Ckse 


When you derived the probabilities of the slot machine, you calculated the 
probability of making each gain or loss. In other words, you calculated the 
probability distribution of a random variable, which is a variable that can 
takes on a set of values, where each value is associated with a specific probability. 
In the case of Fat Dan’s slot machine, the random variable represents the 
amount we’ll gain in each game. 

When we want to refer to a random variable, it’s usual to represent it by a capital 
letter, like X or Y. The particular values that the variable can take are represented 
by a lowercase letter — for example, x or y. Using this notation, P(X = x) is a way 
of saying “the probability that the variable X takes a particular value x.” 

Here’s our slot machine probability distribution written using this notation: 

value 

is 

by x. 


Wcvc 


y. \s 1^- 


Combination 

None 

Lemons 

Cherries 

$s/cherry ^ 

Dollars 

X 


4 

9 

14 - 

y 19 

P(X = x) 

0.977 

0.008 

0.008 

0.006 

0.001 


r\aWe. 


/ 


The variable is discrete. This means that it can only take exact values. 


Wihhih^s is 


As well as giving a table of the probability distribution, we can also show the 
distribution on a chart to help us visualize it. Here is a bar chart showing the slot 
machine probabilities. 

Slot Madiine Probabilities 



TKc pyrokak'»I'»t»cs -fov 
/ 今 ,A, fH, 扣 d fll arc s 。 
七 my ^ 

七 k 
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using discrete probability distributions 



Why should I care about probability 
distributions? All I want to know is 
how much ril win on the slot machine. 
Can I calculate that? 


Once you’ve calculated a probability 
distribution, you can use this information 
to determine the expected outcome. 

In the case of Fat Dan’s slot machine, we can use our 
probability distribution to determine how much you can 
expect to win or lose long-term. 


Q/ Why couldn’t we have just used 
the symbols instead of winnings? I’m not 
sure we’ve really gained that much. 

We could have, but we can do more 
things if we have numeric data because we 
can use it in calculations. You'll see shortly 
how we can use numeric data to work out 
how much we can expect to win on each 
game, for instance. We couldn’t have done 
that if we had just used symbols. 

What if I want to show probability 
distributions on a Venn diagram? 

It's not that appropriate to show 
probability distributions like that. Venn 
diagrams and probability trees are useful if 
you want to calculate probabilities. With a 
probability distribution, the probabilities have 
already been calculated. 



Can you use any letter to represent 
a variable? 

Yes, you can, as long as you don’t 
confuse it with anything else. It’s most 
common to use letters towards the end of 
the alphabet, though, such as X and Y. 

Should I use the same letter for the 
variable and the values? Would I ever use 
X for the variable and y for the values? 

Theoretically, there's nothing to 
stop you, but in practice you’ll find it more 
confusing if you use different letters. It’s best 
to stick to using the same letter for each. 

You said that a discrete random 
variable is one where you can say 
precisely what the values are. Isn’t that 
true of every variable? 


No, it’s not. With the slot machine 
winnings, you know precisely what the 
winnings are going to be for each symbol 
combination. You can’t get any more precise, 
and it wouldn’t matter how many times you 
played. For each game the possible values 
remain the same. 

Sometimes you’re given a range of values 
where any value within the range is possible. 
As an example, suppose you were asked to 
measure pieces of string that are between 
10 inches and 11 inches long. The length 
could be literally any value within that range. 

Don’t worry about the distinction too much 
for now; we’ll look at this in more detail 
later on in the book. For now, every random 
variable we look at will be discrete. 
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expectation and variance of discrete probability distributions 


Expectation gives you a prediction of the results 


You have a probability distribution for the amount you could 
gain on the slot machines, but now you need to know how much 
you can expect to win or lose long-term. You can do this by 
calculating how much you can typically expect to win or lose in 
each game. In other words, you can find the expectation. 

The expectation of a variable X is a bit like the mean, but 
for probability distributions. You even calculate it in a similar 
way. To find the expectation, you multiply each value x by the 
probability of getting that value, and then sum the results. 


Tm the 
expectation. 
Treat me like 
Tm mean. 


O 


The expectation of a variable X is usually written E(X), but 
you’ll sometimes see it written as ja, the symbol for the mean. 
Think of the expectation and mean as twins separated at birth. 


E(X) = 


Here’s the equation for working out E(X): 


吻 is ‘ 广》 ^( X ) 


Multifly tacM value by 产。 baWrb/. 







IxP(X 

T , 

^olc lot ^ 


X) 


Let’s use this to calculate the expectation of the slot machine 
gain. Here’s a reminder of our probability distribution: 


X 

1 

4 

9 

14 

19 

P(X = x) 

0.977 

0.008 

0.008 

0.006 

0.001 


E(X) = (-1 x 0.977) + (4 x 0.008) + (9 x 0.008) + (14 x 0.006) + (19 x 0.001) 
=-0.977 + 0.032 + 0.072 + 0.084 + 0.019 


-_ 0 77 IS amowt m f S you 6 釙 叫私七 

■to 5am on full lcvc\r-a^d it S 

In other words, over a large number of games, you can expect 
to lose SO.77 for each game. This means that if you played the 
slot machine 100 times, you could expect to lose S77. 
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using discrete probability distributions 


Slot Machine ProbaMities 


Ws ^ 七 - W 


ted? 


-p- 






H 1 - 


0 


4 


9 


14 


19 


㈣ ：二 : 


X 



Probability distributions have variance. 

The expectation gives the typical or average value of 
a variable but it doesn’t tell you anything about how 
the values are spread out. For our slot machine, this 
will tell us more about the variation of our potential 
winnings. 

Just like we did in Chapter 3, we can use variance to 
measure this spread. Let’s see how we can do this. 


■••awd variance tells you about the spread of the results 

The expectation tells you how much on average you can expect to win or 
lose with each game. If you lost this amount every single time, where would 
the fun be, and who would play? 

Just because you can expect to lose each time you play doesn’t mean there 
isn’t a small chance you’ll win big. Just like the mean, the expectation doesn’t 
give the full story as the amount you stand to gain on each game could vary 
a lot. How do you think we can measure this? 


(XH x)d 
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calculating variance for discrete probability distributions 


Variances and probability distributions 

Back in Chapter 3, we calculated the variance of a set of numbers. We worked 
out (x - j^) 2 for each number, and then we took the average of these results. 

We can do something similar to work out the variance of a variable X. Instead 

of finding the average of (X - 1 ^) 2 , we find its expectation. We use this M 

formula: . lS ^ ^ 

H - A Var(X) = E(x ■ M ) 2 


There’s just one problem: how do we find the expectation of (X - ja) 2 ? 


So how do we calculate E(X - m ) 巧 

Finding E(X - j^) 2 is actually quite similar to finding E(X). 

When we calculate E(X), we take each value in the probability distribution, 
multiply it by its probability, and then add the results together. In other 
words, we use the calculation 


E(X) = 2xP(X = x) 


When we calculate the variance of X, we calculating (x - jj) 2 for every value 
x，multiply it by the probability of getting that value x，and then add the 
results together. 


E(X - ||) 2 = I(x - U) 2 P(X 




x) 


.. •办 d add 


In other words, instead of multiplying x by its probability, you multiply 
(x - j^) 2 by the probability of getting that value of x. 


Var(X) measures how 
widely my payouts vary. 


O 


o 
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using discrete probability distributions 


Lcfs calculate the slot machme's variance 


tteve’s a v-cw\'mdcv- *tV^c slo*t 


一 md^Wme fv-okak'»r»*t'ics. 


Let’s see if we can use this to calculate the variance of 
the slot machine. To do this, we subtract ja from each 
value, square the result, and then multiply each one by 
the probability. As a reminder, E(X) or [i is -0.77. 


X 

■1 

4 

9 

14 

19 

P(X = x) 

0.977 

0.008 

0.008 

0.006 

0.001 


*fouy\d 二 - 0 门 1 
katk pay 2 . 0 ^ 


Var(X) = E(X - ^i) 2 

=(-1+0.77) 2 x 0.977 + (4+0.77) 2 x 0.008 + (9+0.77) 2 x 0.008 + (14+0.77) 2 x 0.006 + (19+0.77) 2 x 

0.001 、 /\ 

=(-0.23) 2 x 0.977 + 4.77 2 x 0.008 + 9.77 2 x 0.008 + 14.77 2 x 0.006 + 19.77 2 x 0.001 (>< 一 〆 ％ ?(>< 二 乂 ) 


= 0.0516833 + 0.1820232 + 0.7636232 + 1.3089174 + 0.3908529 


This means that while the expectation of our winnings is -0.77, the 
variance is 2.6971. 



What about the standard deviation? 
Can we calculate that too? 



As well as having a variance, probability 
distributions have a standard deviation. 


It serves a similar function to the standard deviation of a set of values. 
It’s a way of measuring how far away from the center you can expect 
your values to be. 

As before, the standard deviation is calculated by taking the square 
root of the variance like this: 


a =|Var(Xj 



da, use sa^e s^o\ U 


This means that the standard deviation of the slot machine winnings is 
V2.6971, or 1.642. This means that on average, our winnings per game 
will be 1.642 away from the expectation of -0.77. 





Would you prefer to play on a slot machine 
with a high or low variance? Why? 
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there are no dumb questions 



So expectation is a lot like the 
mean. Is there anything for probability 
distributions that’s like the median or 
mode? 

You can work out the most likely 
probability, which would be a bit like the 
mode, but you won’t normally have to do this. 
When it comes to probability distributions, 
the measure that statisticians are most 
interested in is the expectation. 

Shouldn’t the expectation be one of 
the values that X can take? 

It doesn’t have to be. Just as the mean 
of a set of values isn’t necessarily the same 
as one of the values, the expectation of a 
probability distribution isn’t necessarily one 
of the values X can take. 


Are the variance and standard 
deviation the same as we had before 
when we were dealing with values? 

They’re the same, except that this time 
we’re dealing with probability distributions. 
The variance and standard deviation of a 
set of values are ways of measuring how 
far values are spread out from the mean. 

The variance and standard deviation of 
a probability distribution measure how 
the probabilities of particular values are 
dispersed. 

I find the concept of E(X - |j) 2 
confusing. Is it the same as finding 
E(X - |j) and then squaring the end result? 

No, these are two different calculations. 
E(X - p) 2 means that you find the square of 
X - (j for each value of X, and then find the 
expectation of all the results. If you calculate 
E(X - |j) and then square the result, you’ll get 
a completely different answer. 

Technically speaking, you’re working out 
E((X ■ |j) 2 ), but it’s not often written that way. 


So what’s the difference between a 
slot machine with a low variance and one 
with a high variance? 

A slot machine with a high variance 
means that there's a lot more variability in 
your overall winnings. The amount you could 
win overall is less predictable. 

In general, the smaller the variance is, the 
closer your average winnings per game are 
likely to be to the expectation. If you play on 
a slot machine with a larger variance, your 
overall winnings will be less reliable. 



V?+aL Statistics 


Use -followmg -formula *bo -f md 

■the of a variable )<： 


E()<) =■ 2%P()< 二 ％) 



V?+aL Statistics 

\/ariahdc 


Use -folloy/mg -formula *to 
daldula*be {ht varia^dc 


二 E()< — ja) z 
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using discrete probability distributions 




Here’s the probability distribution of a random variable X: 


X 

1 

2 

3 

4 

5 

P(X = x) 

0.1 

0.25 

0.35 

0.2 

0.1 


1 _ What’s the value of E(X)? 


2. What’s the value of Var(X)? 


you are here ► 


209 













exercise solution 




E^eRciSe 

§OLytlOH 


Here’s the probability distribution of a random variable X: 


X 

1 

2 

3 

4 

5 

P(X = x) 

0.1 

0.25 

0.35 

0.2 

0.1 


1 ■ What’s the value of E(X)? 

B()() - 2 >cp()<=i>c) ’ 

二 UO.l + + + ^% 0.1 + WO.l 

- o.i + + o.e + 

- z. 邗 




SVAW 


2. What’s the value of Var(X)? 

\/av-()<) z — B()< - ja) z 

—- ； A) Z P()< 二 >0 

二 + (Z-Z.^) z >cO.Z^ + + ( 午一 2>.%) z >cO.Z + (H%) z >cO.I 

二 Ui^) z %0.l + UO.^^O.Vy + (0.0 幻 ScO% + (I.O^O.Z + (ZO^O.I 

二 z.eoz^o.i + o.°iozwo.v^ + o.ooiwov^ + uoz^o.z + ^-.zoz^o.i 

二 o^ov^ + oxi^v^ + o.oooon^ + o.zzo^ + 0 .午 ZOM 
二 I.Z 作 


.^ ^orY OVA^ 


210 Chapter 5 













using discrete probability distributions 



Five JVtlnufe 

Mystery 



The Case of the Moving Expectation 

Statsville broadcasts a number of popular quiz shows, and among these 
is Seal or No Seal. In this show, the contestant is shown a number of 
boxes containing different amounts of money, and they have to choose 
one of them, without looking inside. The remaining boxes are opened 
one by one, and with each one that’s opened, the contestant is offered 
the chance to keep the money in the box they’ve chosen, sight unseen, 
or accept another offer based on the amount of money contained 
in the rest of the unopened boxes. The Statsville Seal Sanctuary 
get a donation based on any winnings the contestant gets. 


The latest contestant is an amateur statistician, and he figures 
he’ll be in a better position to win if he knows what the expectation 
is of all the boxes. He’s just finished calculating the expectation when 
the producer comes over to him. 


“You’re on in three minutes,” says the producer, “and we’ve changed all 
the values in the boxes. They’re now worth twice as much, minus S10.” 

The contestant stares at the producer in horror. Are all his calculations 
for nothing? He can’t possibly work out the expectation from scratch in 
three minutes. What should he do? 

Hovo can the contestant figure out the neiv expectation 
in record time? 
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a new probability distribution 


Fat Pan changed his prices 


In the past few minutes, Fat Dan has changed the cost 
and prizes of the slot machine. Here’s the new lineup. | 七 J fl 《饮 ^ 

ic 卜沈心⑽ M 0 从叶仫 








^2 for eadk game 

= 令 100 

寧寧 (any order) = ^75 


The cost of one game (pull of the lever) on the slot machine 
is now S2 instead of SI, but the prizes are now five times 
greater. If we win, we’ll be able to make a lot more money 
than before. 

Here’s the new probability distribution. 


avc ^ 


y 

■2 

23 

48 

73 

98 

P(Y = y) 

0.977 

0.008 

0.008 

0.006 

0.001 


TV^c ?v-'»z^s 


Vow pays 

5 times 

more! 




TW\s ^ ^ ^ 
㈣ Y ， 0 七》 
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If we knew what the expectation 
and variance were, wed have an idea 
of how much we could win long-term. 









































using discrete probability distributions 



What’s the expectation and variance of the new probability 
distribution? How do these values compare to the previous payout 
distribution's expectation of-0.77 and variance of 2.6971? 


y 

■2 

23 

48 

73 

98 

P(Y = y) 

0.977 

0.008 

0.008 

0.006 

0.001 
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sharpen your pencil solution 



What’s the expectation and variance of the new probability 
distribution? How do these values compare to the previous payout 
distribution's expectation of-0.77 and variance of 2.6971? 


y 

■2 

23 

48 

73 

98 

P(Y = y) 

0.977 

0.008 

0.008 

0.006 

0.001 


Ui) - (-Z) >c o.TH + 23 * o.ooe + 午 《 x o.ooe o.oo^> % o.ool 

二 一 l.W 午 + o.l0 午 + o.ze^r + o .午祕 + o.o^e 
二- a 鈣 

Var(Y^ - - 

- 2 ( y - 

二（一 z+a 鈣 ho.TH + (zi^-o.e^o.ooe + (^o.e^o.ooe + m^o.e^o.oo^ + 

c^o.e^ z %o.ooi 

二（一 I • ⑸ 2 vo.TH + (zz.e^o.ooe + (^e.e^o.ooe + m^) z %o.oo^ + (°i^) % %o.oo\ 

二 i.zziwo^n + ^^zzwo.ooe + iz^.zizwo.ooe + ^z.ezzwo.oot> + 

Wl ^>2.1^0.001 

二 \. vxiq ^ + 午供 o % + \% o °{ Q ^ + + ° inn \ i % v ^ 

二 


The expe 匕 *ta*tion is sli^h-tly lowc\r, so m *thc long y/c dan *to lose fO S 1 ? jdme. The vav-ia^^c is 

mudh la\rjc\r. This medics *tha*t wc s*ta^d *to lose move mo^cy m *t^C \ oy \^ *tc\rm *this bu*t *the\re’s less 
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Do you mean to tell me we have to 
run through complicated calculations 
each time Fat Dan changes his prices? 


The old and new gains are related. 


The cost of each game has gone up to %2, and the prizes are 
now five times higher than they were. As there’s a relationship 
between the old and new gains, maybe their expectations and 
variance are related too. 

Let’s find the relationship. 


















using discrete probability distributions 




aa] Puzzjc 


It's time for a bit of algebra. Your job is 
to take numbers from the pool and 
place them into the blank lines in 
the calculations. You may not use 
the same number more than once, 
and you won’t need to use all the 
numbers. Your goal is to come up 
with an expression for the new gains 
on the slot machine in terms of the old. X 
represents the old gains, Y the new. 


X = (original win) - (original cost) 
=(original win) - 
(original win) = + 


Y = 5 (original win) - (new cost) 

= 5( + )- 

= 5 + - 

= + 



Note: each thing from 
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pool puzzle solution 



Paa] puzzjc 

It's time for a bit of algebra. Your job is 
to take numbers from the pool and 
place them into the blank lines in 
the calculations. You may not use 
the same number more than once, 
and you won’t need to use all the 
numbers. Your goal is to come up 
with an expression for the new gains 
on the slot machine in terms of the old. X 
represents the old gains, Y the new. 


乙 substitute ih 

ou\r expressioh -Pov- the" 


X = (original win) - (original cost) 
=(original win) - 1 ^ 

(original win) = X + 

Y - 5 (original win) - (new cost) 

= 5( ‘ X +1 )- 


The ^os*t 


TK'is ^ives us i\\t Wmnm% 
or ' 15'mdl m 七 ⑽ s of 


2 


*tV^c 

><• 


5 X 

+ 5 

- 

2 

5 

X 

+ 

3 


二 弓 ) < + There’s a dc-f mi-tc 
relationship between )< Bt\d ^ 


Note: each thing from 
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using discrete probability distributions 


There's a linear relationship between E(X) and E(Y) 

We’ve found that we can relate the new gains to the old using 
Y = 5X + 3, where Y refers to the new gains, and X refers to 
the old. What we want to do now is see if there’s a relationship 
between E(X) and E(Y), and Var(X) and Var(Y). 

If there is a relationship, this will save us lots of time if Fat 
Dan changes his prices again. As long as we know what the 
relationship is between the old and the new, we’ll be able to 
quickly calculate the new expectation and variance. 



Let’s see whether there’s a pattern in the relationship between 
E(X) and E(Y), and Var(X) and Var(Y). 


1. E(X) is -0.77 and E(Y) = -0.85. What is 5 x E(X)? What is 5 x E(X) + 3? How does this relate to E(Y)? 


2. Var(X) = 2.6971 and Var(Y) = 67.4275. What is 5 x Var(X)? What is 5 2 x Var(X)? How does this relate to Var(Y)? 


3. How could you generalize this for any probability distribution where Y = aX + b? 
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another sharpen solution 


Let’s see whether there’s a pattern in the relationship between 
E(X) and E(Y), and Var(X) and Var(Y). 

1 • E(X) is -0.77 and E(Y) = -0.85. What is 5 x E(X)? What is 5 x E(X) + 3? How does this relate to E(Y)? 

^ % B()0 二 - 3 供 

$ 卞 Uyo + 3 - - o 供 

B()0 — ^ x. B()() + Z- 

2. Var(X) = 2.6971 and Var(Y) = 67.4275. What is 5 x Var(X)? What is 5 2 x Var(X)? How does this relate to Var(Y)? 

^ % \/a\r()<) — .午必拓 

% \/av-()<) — 厶 7 • 午 2 >?弓 
>c \/av-()<) 

3. How could you generalize this for any probability distribution aX + b? 

B(a)< + b) — a B()<) + b 
\/av-(a)< + b) — a z Varty) 



Slot machine trawsformatiows 

So what did you accomplish over the past few pages? 

First of all, you found the expectation and variance of X, where 
X is the amount of money you stand to make in each game. 

You then wanted to know the effect of Fat Dan’s price changes 
but without having to recalculate the expectation and variance 
from scratch. You did this by working out the relationship 
between the old and the new gains, and then using the 
relationship to work out the new expectation and variance. 

You found that: 


E(5X + 3) = 5E(X) + 3 
Var(5X + 3) = 5 2 Var(X) 



Vow pays 

5 times 

more! 
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using discrete probability distributions 


General formulas for linear transforms 


We can generalize this for any random variable. For any 
random variable X 


E(aX + b) = aE(X) + b 


add b. 


Var(aX + b) = a 2 Var(X) 


S<\uavc 
bv *tV>c 
i\\t W- 


a av>di 、乇 


This is called a linear transform, as we are dealing with 
a linear change to X. In other words, the underlying 
probabilities stay the same but the values are changed into 
new values of the form aX + b. 


tJiereicire no ^ 

Dumb Qjaesti9ns 


Do a and b have to be constants? 

Yes they do. If a and b are variables, then this result won't hold 
true. 

Where did the b go in the variance? 

Adding a constant value to the distribution makes no difference 
to the overall variance, only to the expectation. 

When you add a constant to a variable, it in effect moves the 
distribution along while keeping the same basic shape. This means 
that the expectation shifts along by b, but as the shape remains 
unchanged, the variance says the same. 

I’m surprised I have to multiply the variance by a 2 . Why’s 

that? 

When you multiply a variable by a constant, you multiply all its 
underlying values by that constant. 

When you calculate the variance, you perform calculations based 
on the square of the underlying values. And as these have been 
multiplied by a, the end result is that you multiply the variance by a 2 . 


Do I really have to remember how to do linear transforms? 
Are they important? 

Yes, they are. They can save you a lot of time in the long 
run, as they eliminate the need for you to have to calculate the 
expectation and variance of a probability distribution every time the 
values change. Rather than calculating a new probability distribution, 
then calculating the expectation and variance from scratch, you can 
just plug the expectation and variance you already calculated into the 
equations above. 

Knowing linear transforms can also help you out in exams. First of 
all, you can save valuable time if you know what shortcuts you can 
take. Furthermore, exam papers don’t always give you the underlying 
probability distribution. You might be told the expectation of variable, 
and you may have to transform it based on very basic information. 

Q；i tried calculating the expectation and variance the long 
way round and came up with a different answer. Why? 

You’ve seen by now that it’s easy to make mistakes when you 
calculate expectations and variances. If you calculate these longhand, 
there's a good chance you made a mistake somewhere along the line. 
You’re always better off using statistical shortcuts where possible. 
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mystery solved! 


Solved: The Case of the Moving Expectation. 

Hoiv can the contestant figure out the neiv expectation in record 
time? 


The contestant looks around in panic for a brief moment and 
then relaxes. The change in values isn’t such a big problem 
after all. 

The contestant has already spent time calculating the 
expectation of the original values of all the boxes, and this has 
given him an idea of how much money is available for him to win 

The producer has told him that the new prizes are ten dollars less than twice 
the original prizes. In other words, this is a linear transform. If X represents the 
original prize money and Y the new, the values are transformed using Y = 2X — 10. 



The contestant finds E(Y) using E(2X-10) = 2E(X) — 10. This means that all 
he has to do to find the new expectation is double his original expectation and 


subtract 10. 


BULLET POINTS 



v 琳 5+a+f^ 

Umcav Tv-a^ov-ms 

you a >< ^ 

y.umWs a ar>a V), 如： 

+ b ) 二 a 抑 ） + b 

Vav(a>( + W 二 3^()0 


Probability distributions describe the probability of all 
possible outcomes of a given variable. 

The expectation is the expected average long-term 
outcome. It’s represented as either E(X) or [i, and is 
calculated using E(X) = 2xP(X=x). 

The expectation of a function of X is given by 
E(f(X)) = 2f(x)P(X=x) 

The variance of a probability distribution is given by 
Var(X) = E(X - ^i) 2 


The standard deviation of a probability distribution 

is given by a = V Var(X) 

Linear transforms are when a variable X is transformed 
into aX + b, where a and b are constants. The expectation 
and variance are given by: 

E(aX + b) = aE(X) + b 
Var(aX + b) = a 2 Var(X) 
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using discrete probability distributions 



So do linear transforms give me a quick way 
of calculating the expectation and variance 
when I want to play multiple games? 


There’s a difference between using linear transforms and 
playing multiple games. 

With linear transforms, all of the probabilities stay the same, but the possible values 
change. The values are transformed, but not the probabilities. There are still the same 
number of possible values. 

When you play multiple games, both the values and the probabilities are different, 
and even the number of possible values can change. It’s not possible to just transform 
the values, and working out the probabilities can quickly become complicated. 

Let’s look at a simple example. Imagine you were playing on a very simple slot 
machine with probability distribution X. 


X 

1 

5 

P(X = x) 

0.9 

0.1 




Now pays 
doutle! 


To find the probability distribution of 2X, you just need avc 

to multiply the x values by 2. The underlying values V)W 2-- 

^ 一 

as behove- 




change because the potential gains have doubled. 




2x 

■2 

10 

P(2X = 2x) 

0.9 

0.1 




What if you were going to play two games on 
the slot machine? You’d need to work out the 
probability distribution from scratch by considering 
all the possible outcomes from both games. 

Y 二一 2- Y ou ^ osc 

bo*tK 

七 he 


o-f two 


w 

■2 

4 

10 ^=r 

P(W = w) 

0.81 

0.18 

0.01 




■圊 


y—10 i-P you 
v/m both 
jXs. 


This time, both the probabilities and values have 7 =1 ^" T ^ 
changed. So how can we find the expectation and 
variance for this situation? 


# - 1 •一 c 
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introducing independent observations 


Every pull of the lever is an mdependent observation 


When we play multiple games on the slot machine, each game 
is called an event, and the outcome of each game is called an 
observation. Each observation has the same expectation and 
variance, but their outcomes can be different. You may not gain 
the same amount in each game. 

We need some way of differentiating between the different 
games or observations. If the probability distribution of the slot 
machine gains is represented by X, we call the first observation 



Eadk game is called 
an event. 

Tke outcome of eack 
game is called an 
observation* 




0W^ at ' ovx 




X 1 and X 2 have the same probabilities, possible values, 
expectation and variance as X. In other words, they have the 
same probability distribution, even though they are separate 
observations and their outcomes can be different. 




d'siHbu-tioh X 


P^robabili-ty 


O 


X 1 

1 

5 

P(X, = X,) 

0.9 

0.1 


X 2 

■1 

5 

P(X 2 = x 2 ) 

0.9 

0.1 


When we want to find the expectation and variance of two 
games on the slot machine, what we really want to find is 
the expectation and variance of X 1 + X 2 . Let’s take a look at 
some shortcuts. 
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using discrete probability distributions 


Observation shortcuts 


Let’s find the expectation and variance of X + X . 


Expectation 

First of all, let’s deal with E(X 1 + X 2 ). 

E(X 1 + X 2 ) = EA) + E(X 2 ) 

= E(X) + E(X) <7 
= 2E(X) 


as >< 


In other words, if we have the expectation of two observations, we 
multiply E(X) by 2. This means that if we were to play two games 
on a slot machine where E(X) = -0.77, the expectation would be 
-0.77x2, or-1.54. 

We can extend this to deal with multiple observations. If we want to 
find the expectation of n observations, we can use 



X 1 + X 2 is not 
the same as 2X_ 


WatcK it! 以 =r 

two observations 
ofX. 2X means you have one 
observation, but the possible 
values have doubled. 




,r .are ^ oW — 。响 


E(X, + X 


2 


x n ) = nE(X) 


Variance 

So what about Var(X 1 + X 2 )? Here’s the calculation. 

Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) 

=Var(X) + Var(X) ^_ 

= 2Var(X) 




This means that if we were to play two games on a slot machine where 
Var(X) = 2.6971，the variance would be 2.6971x2, or 5.3942. 




We can extend this for any number of independent observations. If we 
have n independent observations of X V 

,yvVAwbcV ok O 

… x n ) = nVar(X) 




Vard + X 2 


In other words, to find the expectation and variance of multiple 
observations，just multiply E(X) and Var(X) by the number of observations. 
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no dumb questions 



Qj Isn’t + X 2 ) the same as E(2X)? 

They look similar but they’re actually 
two different concepts. 

With E(2X), you want to find the expectation 
of a variable where the underlying values 
have been doubled. In other words, there’s 
only one variable, but the values are twice 
the size. 


So are X 1 and X 2 the same? 

They follow the same distribution, but 
they’re different instances or observations. 
As an example, X 1 could refer to game 
1, and X 2 to game 2. They both have the 
same probability distribution, but the actual 
outcome of each might be different. 


I see that the new variance is 
nVar(X) and not n 2 Var(X) like we had for 
linear transforms. Why’s that? 

This time we have a series of 
independent observations, all distributed the 
same way. This means that we can find the 
overall variance by adding the variance of 
each one together. If we have n independent 
observations, then this gives us nVar(X). 


With E(X 1 + X 2 ), you’re looking at two 
separate instances of X, and you’re looking 
at the joint expectation. As an example, if X 
represents the distribution of a game, then 
X 1 + X 2 represents the distribution of two 
games. 


When we calculate the variance of Var(nX), 
we multiply the underlying values by n. As 
the variance is formed by squaring the 
underlying values, this means that the 
resulting variance is n 2 Var(X). 


Vf+aL S+a+fstfcs —— 

Use 七 he -folloy/mg -formula *to dalduld*be 
vav-ia^dc 



以) <,+ ><z + … + x ) 二 
VarC^ + Xz + … + X) 二 


BULLET POINTS - 

■ Probability distributions describe the probability of 
all possible outcomes of a given random variable. 

■ The expectation of a random variable X is the 
expected long-term average. It's represented as 
either E(X) or |j. It's calculated using 

E(X) = IxP(X=x) 

■ The variance of a random variable X is given by 

Var(X) = E(X - p) 2 


The standard deviation a is the square root of the 
variance. 

Linear transforms are when a random variable 
X is transformed into aX + b, where a and b are 
numbers. The expectation and variance are given 

by 

E(aX + b) = aE(X) + b 
Var(aX + b) = a 2 Var(X) 
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using discrete probability distributions 


系隹 fTV 办 Ti 办 

Below are a series of scenarios. Assuming you know the distribution 
of each X, and your task is to say whether you can solve each problem 
using linear transforms or independent observations. 

Linear Indep endent 

transform observation 


The amount 9? co^ee In an extra 
large cup of cg^ee ； X is the amount 
o? co^ee !n a normal - sized cup. 

Drinking an extra cup of co^ee 
per day ； X Is tKe amount 9? co^ee 
In a cup. 


Finding the net gain from buying do 
lottery tickets; X is th^ net gain o? 
buying 1 lottery ticket. 


Finding t}ie net gain from a l^tteiy 
ticket ct¥t©r th& price of tickets 
goes up ； X is th& net gain of buying 
i lottery ticket. 

Buying an extra Kento lay eggs 
?9t brectkfctst ； X IS ihe number o? 
eggs laid per Week by a certclln 
breed of }ien. 
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linear transform or independent observation solution 


办兑 iMO 芭芭 NT 
- 系隹 frv 办 Ti 办 一 

Below are a series of scenarios. Assuming you know the distribution 
of each X, and your task is to say whether you can solve each problem 
using linear transforms or independent observations. 


Line ar Indep endent 

transform observation 


The amount 9? cg^ee In an extra 
large cup of co^ee ； X is th^ WOflQunt 
o? co^ee In a normcil-slzecl cup. 



Drinking cin extra cup of cg^ee 
per day ； X Is tKe amount 9? co^ee 
In a cup. 




Finding tKe net gain from buying do 
fetteiy tickets; X Is t}ie net gain 9?^ 
buying d lottery ticket. 


丁 he wihhihgs -pv-orn eddh 
loiie\ry ticket a\rc 
ihdcpchdcht o-P the othev-s. 


Finding tKe net gain from a l^tteiy 
ticket after ihe price of tickets 
goes up ； X is the net gain of buying 
1 lottery ticket. - 


Buying an extra Kento lay eggs 
? 9 t breakfast ； X IS tKe number O? 
eggs lcild per Week by a certain 
breed of }ien. 



a -ticket 

Aar^es c%fc^*bcd but 

y\ 。七七 p\robab'ili*ty o-f W\v\y\\v\^, 
so 七 Wis tan be solved v/rtii Imear 

*b"d 灼 s<fovms. - 
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using discrete probability distributions 




The local diner has started selling fortune cookies at $0.50 per cookie. Hidden within 
each cookie is a secret message. Most messages predict a good future for the buyer, 
but others offer money off at the diner. The probability of getting $2 off is 0.1, the 
probability of getting $5 off is 0.07, and the probability of getting $10 off is 0.03. 

If X is the net gain, what’s the probability distribution of X? What are the values of E(X) 
and Var(X)? 


The diner decides to put the price of the cookies up to $1. What are the new expectation and variance? 
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exercise solution 



The local diner has started selling fortune cookies at $0.50 per cookie. Hidden within 
each cookie is a secret message. Most messages predict a good future for the buyer, 
but others offer money off at the diner. The probability of getting $2 off is 0.1, the 
probability of getting $5 off is 0.07, and the probability of getting $10 off is 0.03. 


If X is the net gain, what’s the probability distribution of X? What are the values of E(X) 
and Var(X)? 


probability dis*t\ribu*tioir\ o ()(： 



- 0 .弓 

1 .^ 



p(>< - %) 

o.e 

0.1 

0.01 

0.01 


B()<) - Uo.^ho.e + i.wo.l + \wo.ol + 

- 一 O .午 + 0.1^ + 031 ^ + 0.19^ 

\/av-()<) — E()< - ^) z 

—- ； a) z P() ( 二 乂 ) 

二 Uo.^o.z^o.e + + (午 石一 0书)\0.07 + (^o.z^o.oz 

二（一0供 )^0.0 + (1.1^) 1 %0.1 + (午.⑸ ScO.O? + 

二 oniVy%0.e + I.ZZZ^O.I + miVy%O.Ol + ezniVy%O.OZ 
二 0.^1^ + 0\VL1^ + \. 2.0^1^ + 

二午 . 午 r/$ 


The diner decides to put the price of the cookies up to $1. What are the new expectation and variance? 

The drnev- pu*ts p\ridc o( dookics up by fO. 弓 O, y/hidh medics 
the mev/ 於七 ^d'ms a\rc modelled by ) (- O.^ 

B()< - O^) - B()<) - 
二- 

二 - o.l 弓 

VaAy, - 0^) - VarCy.) 

二午.午 
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using discrete probability distributions 


New slot machine on the block 

Fat Dan has brought in a new model slot machine. Each 
game costs more, but if you win you’ll win big. Here’s the 
probability distribution: 


^o\rc 




X 

■5 

395 

P(X = x) 

0.99 

0.01 


We’ve looked at the expectation and variance of playing a 
single machine, and also for playing several independent games 
on the same machine. What happens if we play two different 
machines at once? 

In this situation, we have two different, independent probability 
distributions for our machines: 



X 

■5 

395 

P(X = x) 

0.99 

0.01 


"These 3v*c the du\rv~Ch't 

gaihs o-P Fat 
slot rwadhihC- 


y 

■2 

23 

48 

73 

98 

P(Y = y) 

0.977 

0.008 

0.008 

0.006 

0.001 


So how can we find the expectation and variance of playing 
one game each on both machines? 


0 ^ma\ s\ot - 





We could work out the probability 
distribution of X + Y, but that would be time- 
consuming, and we might make a mistake. I 
wonder if we can take another shortcut? 
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addition and subtraction of random variables 


Add E(X) awd E(Y) to get E(X ^ Y)... 

We want to find the expectation and variance of playing one game each 
on both of the slot machines. In other words, we want to find E(X + Y) 
and Var(X + Y) where X and Y are random variables representing the 
two machines. X and Y are independent. 

One way of doing this would be to calculate the probability distribution 
of X + Y, and then calculate the expectation and variance. 


X 







y 







Poh : 七 wcVc hot ask'ma 

y° u ^ulaic this. 一 ^ 、 



x + y 







O 




Fortunately we don’t have to do this. To find E(X + Y) ? all we 
need to do is add together E(X) and E(Y). 

Intuitively this makes sense. If, for example, you were playing 
two games where you would expect to win S5 in one game 
and S10 in the other, you would expect to win S15 overall — 
S5 + S10. 

We can do something similar with the variance. To find 
Var(X +Y), we add the two variances together. This works for 
all independent random variables. 


E(X + Y) = E(X) + E(Y) 


Var(X + Y) = Var(X) + Var(Y) 


o 


E(X) 

4 


Var(X) 



o 


E(Y) 


Var(Y) 


o 


E(X + Y) 




Var(X + Y) 


VoJlrics 


.iOh 


*»»oirc. 



Adding the 
variances 
together only 
works for 
independent 


random variables 


If X and Y are not independent, 
then Var(X + Y) is no longer 
equal to Var(X) + Var(Y). 
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using discrete probability distributions 


awd subtract E(X) awd E(Y) to get E(X - Y) 


You’re not just limited to adding random variables; you 
can also subtract one from the other. Instead of using the 
probability distribution of X + Y，we can use X — Y 


If you’re dealing with the difference between two random 
variables, it’s easy to find the expectation. To find E(X - Y), 
we subtract E(Y) from E(X). 


E(X ■ Y) = E(X) - E(Y) 


Finding the variance of X — Y is less intuitive. To find 
Var(X — Y), we add the two variances together. 


Var(X - Y) = Var(X) + Var(Y) 

f 

tVe add the so be 




Because the variability increases. 

When we subtract one random variable 
from another, the variance of the probability 
distribution still increases. 



If you’re 
subtracting 
two random 
variables, add 
the variances. 

It’s easy to make this 
mistake as at first glance it 
seems counterintuitive. Just 
remember that if the two 
variables are independent, 
Var(X-Y) = Var(X) + Var(Y) 


o 


E(X) 


Var(X) 


o 



E(Y) 


1 




E(X ■ Y> 


㈠ 

Var(Y) 


Var(X - Y) 




When we subtract independent random variables, the 
variance is exactly the same as if we’d added them together. 
The amount of variability can only increase. 


Subtracting inctependent 
ranctom variables still 
increases tke variance. 
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adding and subtracting linear transformations 


You can also add and subtract linear trawsformatiows 

It doesn’t stop there. As well as adding and subtracting random variables, 
we can also add and subtract their linear transforms. 

Imagine what would happen if Fat Dan changed the cost and prizes on both 
machines, or even just one of them. The last thing we’d want to do is work 
out the entire probability distribution in order to find the new expectations 
and variances. 

Fortunately, we can take another shortcut. 

Suppose the gains on the X and Y slot machines are changed so that the 
gains for X become aX, and the gains for Y become bY. a and b can be any 
number. 

To find the expectation and variance for combinations of aX and bY, 
we can use the following shortcuts. 

Adding aX and bY 

If we want to find the expectation and variance of 
aX + bY, we use 

E(aX + bY) = aE(X) + bE(Y) 

Var(aX + bY) = a 2 Var(X) + b 2 Var(Y) 

We square the numbers because it’s a linear 
transform, just like before. 


Subtracting aX and bY 

If we subtract the random variables and calculate 
E(aX - bY) and Var(aX - bY), we use 

E(aX- bY) = aE(X) - bE(Y) 

Var(aX - bY) = a 2 Var(X) + b 2 Var(Y) 

V 

Just as before, we add the variances, even though "to add vav-'ia^tcs. 

we’re subtracting the random variables. 


Ft、a Vmcav so 


X aX 

Y bY 

/ 

a ahd b be ahy 
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using discrete probability distributions 



So if X and Y are games, does 
aX + bY mean a games of X and b games 
ofY? 

aX + bY actually refers to two linear 
transforms added together. In other words, 
the underlying values of X and Y are 
changed. This is different from independent 
observations, where each game would be an 
independent observation. 

I can’t see when I’d ever want to 
use X - Y. Does it have a purpose? 

X - Y is really useful if you want to find 
the difference between two variables. 

E(X - Y) is a bit like saying “What do you 
expect the difference between X and Y to 
be", and Var(X - Y) tells you the variance. 


Why do you add the variances for 
X - Y? Surely you’d subtract them? 

At first it sounds counterintuitive, 
but when you subtract one variable from 
another, you actually increase the amount 
of variability, and so the variance increases. 
The variability of subtracting a variable is 
actually the same as adding it. 

Another way of thinking of it is that 
calculating the variance squares the 
underlying values. Var(X + bY) is equal to 
Var(X) + b 2 Var(Y), and if b is -1, this gives us 
Var(X - Y). As (-1) 2 = 1, this means that 
Var(X - Y) = Var(X) + Var(Y). 


Can we do this if X and Y aren’t 
independent? 

No, these rules only apply if X and 
Y are independent. If you need to find the 
variance of X + Y where there’s dependence, 
you’ll have to calculate the probability 
distribution from scratch. 

It looks like the same rules apply 
forX +Yas X 1 + X 2 . Is this correct? 

Yes, that’s right, as long as X, Y, X 1 
and X 2 are all independent. 


BULLET POINTS - 

■ Independent observations of X are different instances 
of X. Each observation has the same probability 
distribution, but the outcomes can be different. 

■ If X r X 2 ,X n are independent observations of X then: 

E(VV_._ + X n ) = nE(X) 

Var% + X 2 + ___ X n ) = nVar(X) 


■ If X and Y are independent random variables, then: 

E(X + Y) = E(X) + E(Y) 

E(X - Y) = E(X)- E(Y) 

Var(X + Y) = Var(X) + Var(Y) 

Var(X - Y) = Var(X) + Var(Y) 


■ The expectation and variance of linear transforms of X 
and Y are given by 

E(aX + bY) = aE(X) + bE(Y) 

E(aX - bY) = aE(X) - bE(Y) 

Var(aX + bY) = a2Var(X) + b2Var(Y) 

Var(aX - bY) = a2Var(X) + b2Var(Y) 
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expectation and variance exercises 



E%eftci$e 


Below you'll see a table containing expectations and variances. Write the formula or shortcut for 
each one in the table. Where applicable, assume variables are independent. 


Statistic 

Shortcut or formula 

E(aX + b) 


Var(aX + b) 


E(X) 


E(f(X)) 


Var(aX - bY) 


Var(X) 


E(aX - bY) 


E(\ + X 2 + X 3 ) 


Var(X 1 + X 2 + X 3 ) 


E(X 2 ) 


Var(aX - b) 
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using discrete probability distributions 




A restaurant offers two menus, one for weekdays and the other for weekends. Each menu offers 
four set prices, and the probability distributions for the amount someone pays is as follows: 


Weekday: 


X 

10 

15 

20 

25 

P(X = x) 

0.2 

0.5 

0.2 

0.1 


Weekend: 


y 

15 

20 

25 

30 

P(Y = y) 

0.15 

0.6 

0.2 

0.05 


Who would you expect to pay the restaurant most: a group of 20 eating at the weekend, or a 
group of 25 eating on a weekday? 
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exercise solutions 




jRciSe 
SoLytion 


Below you'll see a table containing expectations and variances. Write the formula or shortcut for 
each one in the table. Where applicable, assume variables are independent. 


Statistic 

Shortcut or formula 

E(aX + b) 

aE()<) + b 

Var(aX + b) 

a z Var()() 

E(X) 

》 P()< 二乂） 

E(f(X)) 

二 >0 

Var(aX - bY) 

aWsrty) + b z \/av-(/) 

Var(X) 

B()( - y) 1 ^ W) -〆 

E(aX - bY) 

aU)0 - \>W 

E(\ + X 2 + X 3 ) 


Var(X 1 + X 2 + X 3 ) 


E(X 2 ) 

2>c z p()< - 

Var(aX - b) 

a z \/av-(><) 
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using discrete probability distributions 



A restaurant offers two menus, one for weekdays and the other for weekends. Each menu offers 
four set prices, and the probability distributions for the amount someone pays is as follows: 


Weekday: 


X 

10 

15 

20 

25 

P(X = x) 

0.2 

0.5 

0.2 

0.1 


Weekend: 


y 

15 

20 

25 

30 

P(Y = y) 

0.15 

0.6 

0.2 

0.05 


Who would you expect to pay the restaurant most: a group of 20 eating at the weekend, or a 
group of 25 eating on a weekday? 


Lt{!s s*ta\rt by a weekday a weekend- )< \rcpv-csc^*ts 

someone paying oy\ b weekday, ^ \rcp\rcsc^*ts someone paymg a 七 *tiic weekend- 

B()0 二 10%0.1 + \WO.^ + 10^0.1 + IWO.l 
二 Z + + 午 + ZS 

二 A 


_ 二 + Z0%0.^> + 1^0.1 + 

-Z.Z^ + IZ + ^ + |.^ 

二 


Bsdh pc\rso 灼 catmg 3*t *tiic \rcs*tau\rair\*t is By\ mdcpchdc 的 *t obscv-va*tioir\, d^d *to -f md 
amou^*t spc^*t by jv-oup, wc multiply *tiic e 乂 pe^t3*ticm by *tiic ^umbev- 'm jv-oup. 


people a weekday gives us Z^E()<) — Vy%\^> — 午 00 

%0 people catmg a*t *thc weekend jives us 2-O>c£(\0 二 ZOxZOH 弓 二午 I 弓 


This mca^s y/c da 的 c>^pcd*t 2.0 people catmg a*t weekend *to pay mov-c 2 •弓 

people catmj oi^ a weekday. 
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you’re an expectation expert! 



Jackpot! 


Jackpot! 


mm 




6 


You’ve covered a lot of ground in 
this chapter. You learned how to use 
probability distributions, expectation, 
and variance to predict how much you 
stand to win by playing a specific slot 
machine. 


And you discovered how to use 
linear transforms and independent 
observations to anticipate how much 
you’ll win when the payout structure 
changes or when you play multiple 
games on the same machine. 
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using discrete probability distributions 




Sam likes to eat out at two restaurants. Restaurant A is generally more expensive than 
restaurant B, but the food quality is generally much better. 

Below you'll find two probability distributions detailing how much Sam tends to spend at each 
restaurant. As a general rule, what would you say is the difference in price between the two 
restaurants? What’s the variance of this? 


Restaurant A: 


X 

20 

30 

40 

45 

P(X = x) 

0.3 

0.4 

0.2 

0.1 


Restaurant B: 

y 

10 

15 

18 


P(Y = 

=y) 

0.2 

0.6 

0.2 
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exercise solution 



Sam likes to eat out at two restaurants. Restaurant A is generally more expensive than 
restaurant B, but the food quality is generally much better. 

Below you'll find two probability distributions detailing how much Sam tends to spend at each 
restaurant. As a general rule, what would you say is the difference in price between the two 
restaurants? What’s the variance of this? 


Restaurant A: 


X 

20 

30 

40 

45 

P(X = x) 

0.3 

0.4 

0.2 

0.1 


Restaurant B: 

y 

10 

15 

18 


P(Y = 

=y) 

0.2 

0.6 

0.2 


Lt{!s s*ta\rt by vavia^^c o-f )< y. 


B()<) - + + ^0%0.1 + 午 ^ ).1 

二么 + IZ + « + 午石 
- 10^ 

W><) 二 + + 

二 UO.^O.Z + ( 一 0 石 )ScO . 午 + °{^OX + I 午 
- 110.1^03 + O.l^OA + °{OX^OX + 110.1W0.1 
二 11.01^ + 0.1 + 10.0^ + 11.0VS 
二 m.Vy 


_ 二 \o%o.i + \woi + ie%o.z 

二 z + m 么 

二 I 午 4 

Vsr(Y) — do - 1 午 i) z >co.i + (m 午 .DScoi + 
(10 -1 午 

二（一午.幻、0上 + 0.午 ScOi + 孓午 2 yO.Z 
- 11.1^0.1 + 0.li>%0.i> + II.^O.Z 
-午 .23Z + 0.0^ + Z3IZ 
二 W 午 


The di-f-fcv-c^dc between )< a^d / is modeled by )< - /. 


B()( -小二 B()0 - B(Y) 

二 - I 午.厶 
二州 


Varty - - Varty) + \/W) 

- 11.1^ + 

二 70 .的 
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6 permutcitions and 


% Making Arrangements ^ 



Sometimes, order is important. 

Counting all the possible ways in which you can order things is time 
consuming, but the trouble is, this sort of information is crucial for 
calculating some probabilities. In this chapter, we’ll show you a quick way 
of deriving this sort of information without you having to figure out what all 
of the possible outcomes are. Come with us and well show you how to 
count the possibilities. 


this is a new chapter 
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at the racetrack 


The Statsvillc Pcrby 

One of the biggest sporting events in Statsville is the Statsville Derby. 
Horses and jockeys travel from far and wide to see which horse can 
complete the track in the shortest time, and you can place bets on the 
outcome of each race. There’s a lot of money to be made if you can 
predict the top three finishers in each race. 

The opening set of races is for rookies, horses that have never 
competed in a race before. This time, no statistics are available for 
previous races to help you anticipate how well each horse will do. 
This means you have to assume that each horse has an equal chance 
of winning, and it all comes down to simple probability. 

The first race of the day, the three-horse race, is just about to begin, 
and the Derby is taking bets. You have #500 of winnings from Fat 
Dan’s Casino to spend at the Derby. If you can correctly predict the 
order in which the three horses finish, the payout is 7:1, which means 
you’ll win 7 times your bet, or S3,500. 



Should we take this bet? Let’s work out some probabilities and find 


out. ^~^ - 

( Want to join in with the fun? 
^ If you know a thing or two 
r about probability, you could do 
very well indeed. 


/W$:/ payout that i*f 
you wih, you'll cav-h /*5 h^e s 
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Ifs a three-horse race 

The first race is a very simple one between three horses, and in order 
to make the most amount of money, you need to predict the exact 
order in which horses finish the race. Here are the contenders. 



Cheek/ Sherbet 


^harpen your pencil 



How many different ways are there in which the horses can finish 
the race? (Assume there are no ties and that every horse finishes.) 
What’s the probability of winning a bet on the correct finishing 
order? 


Calculate your expected winnings for 


^hj^bet. 


Wiirt: Fihd the pvobabili-fcy 
dis-tv-ibutioh -Pov- -this cvcht. 

Thch use -this -to 

the 

< 


lOh. 
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sharpen solution 


^harpen your pencil 

Sobtion 


How many different ways are there in which the horses can finish 
the race? (Assume there are no ties and that every horse finishes.) 
What’s the probability of winning a bet on the correct finishing 
order? 


Calculate your expected winnings for this bet. 

avc ^ ways \radc be -f mishcd ： 

Cheeky Shc\rbc*t, Ruby Toupee, Fvisky Fu^boy 
Cheeky Shcv-bct Fv-isky Fu^boy, Ruby Toupee 
Ruby Toupee, Cheeky Shcv-bc*t, fVisky Fu^boy 
Ruby Toupee, FVisky Fu^boy, Cheeky Shc\rbc*t 
FVisky Pimboy, Cheeky Shc\rbc*t, Ruby Toupee 
Fv-isky Fu^boy, Ruby Toupee, Cheeky Shc\rbc*t 
The p\robabili*ty o-f yttmj *tiic order vijh 七 is *thc\rc-fo\rc l/^>. 


Yes, you can expect to win 
$168 on this bet, but the house 
is still going to win 5/6 times 
you play. Do you feel lucky? 


ttcv-c^s p\robabili*ty dis*bnbu*tio 灼 -fo\r amouir\*t moi^cy you tBY\ c>^pcd*t 
*to y/m i-f you bc*t f$00 odds o( l'l 



Three — horse \radc ： 


% 

^00 


?ty - >c) 


o.in 


B()<) - ^oo^o.ezi + i^oo%o.\^l 

— I^>G 

Vh dan *to y/m fl^ *time -this \radc is vjom- 



A three-horse race? How 
likely is that? Most races will 
have far more horses taking part. 
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Exactly, most races will have more than three horses. 

So what we need is some quick way of figuring out how many finishing orders 
there are for each race, one that works irrespective of how many horses are 
racing. 

Working out the number of ways in which three horses can finish a race is 
straightforward; there are only 6 possibilities. The trouble is, the more horses 
there are taking part in the race, the harder and more time consuming it is to 
work out every possible finishing order. 

Let’s take a closer look at the different ways of ordering the three horses we have 
for the race and see if we can spot a pattern. We can do this by looking at each 
position, one by one. 


















permutations and combinations 


o 


How many ways caw they cross the finish liwc? 


Let’s start by looking at the first position of the race. 

One of the horses has to win the race, and this 
can be any one of the three horses taking part. 

This means that there are three ways of filling the 
number one position. 



3 ways 



OY\t V^OVSC 

tvoss *tV^C -f \\Y\t 

'*b be 

V^ovscs. 


So what about the second position in the race? 

If one of the horses has finished the race, this 
means there are two horses left. Either of these can 
come second in the race. This means that there are 
two ways of filling the number two position, no 
matter which horse came first. 


o 




1 /° 



2 


ways 


Ov\C ho\rsc has 
already -fihished the 

so thc\rc av-c 
0 ^ly two ho\rscs that 
“h -fihTsh sedohd. 


6 



Once two horses have finished the race, there’s 
only one position left for the final horse — third 
place. 


So how does this help us calculate all the possible 
finishing orders? 


0hly OhC ho\rsc hash^t 
-finished the so 
tkvVs ohly ohc fosi-tioh 
Ic-Pt -Po\r him ： last 

N \ 


o 




way 
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making arrangements 


Calculate the number of arrangemewts 

We just saw that there were 3 ways of filling the first position, and for 
each of these, there are 2 ways of filling the second position. And no 
matter how those first two slots are filled, there’s only one way of filling 
the last position. In other words, the number of ways in which we can fill 
all three positions is: 

^ illih 9 ^ ^3x2x1 = 6 一 - b y,ays -f illip all ^ ?os»t»o^s 

"- / way -Piling 

the i\rd posi-tioh 

This means that we can tell there are 6 different ways of ordering the 
three horses, without us having to figure out each of the arrangements. 


So what if there are w horses? 

You’ve seen that there are 3x2x1 ways of ordering 3 horses. You can 
generalize this for any number n. If you want to work out the number 
of ways there are of ordering n separate objects, you can get the right 
result by calculating: 

nx(n-1)x(n-2)x... x3x2x1 

This means that if you have to work out the number of ways in which 
you can order n separate objects, you can come up with a precise figure 
without having to figure out every possible arrangement. 

This type of calculation is called the factorial of a number. In math 
notation, factorials are represented as an exclamation point. For 
example, the factorial of 3 is written as 3!, and the factorial of n is n!. 

You pronounce it “n factorial.” 

So when we write n!, this is just a shorthand way of saying “take all 
the numbers from n down to 1, and multiply them together.” In other 
words, perform the following calculation: 

n! = n x (n - 1) x (n - 2) x ... x 3 x 2 x 1 

The advantage of n! is that a lot of calculators have this as an available 
function. If, for example, you want to find the number of arrangements 
of 4 separate objects, all you have to do is calculate 4!, giving you 
4x3x2x1=24 separate arrangements. 
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&omg round m circles 


There’s one exception to this rule, and that’s if you’re arranging 
objects in a circle. 

Here’s an example. Imagine you want to stand four horses in a circle, 
and you want to find the number of possible ways in which you can 
order them. Now, let’s focus on arrangements where Frisky Funboy 
has Ruby Toupee on his immediate right, and Cheeky Sherbet on his 
immediate left. Here are two of the four possible arrangements of this. 



Odier 


Ruby Toupee Otiier 



At first glance, these two arrangements look different, but they’re actually 
the same. The horses are in exactly the same positions relative to each 
other, the only difference is that in the second arrangement, the horses 
have walked a short distance round the circle. This means that some of 
the ways in which you can order the horses are actually the same. 

So how do we solve this sort of problem? 


The key here is to fix the position of one of the horses, say Frisky Funboy. 
With Frisky Funboy standing in a fixed position, you can count the 
number of ways in which the remaining 3 horses can be ordered, and this 
will give you the right result without any duplicates. 


In general, if you have n objects you need to arrange in a circle, the 


number of possible arrangements is given by 

(n ■ 1)! < 


objects m a 


oJf M n 
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no dumb questions 



How do I pronounce n!? 

• You pronounce it as “n factorial ■” The 
! symbol is used to indicate a mathematical 
operation, and not to indicate any sort of 
exclamation. 

Are factorials just used when 
you’re arranging objects? 

Not at all. Factorials also come into 
play in other branches of mathematics, 
like calculus. In general, they’re a useful 
math shorthand, and you’ll see the factorial 
symbol whenever you’re faced with this sort 
of multiplication task. 

All the factorial symbol really means is 
"take all the numbers from n down to 1 and 
multiply them together ■” 

Q/ What if I have a value 0? How do I 
find 0!? 

0! is actually 1. This may seem like a 
strange result, but it’s a bit like saying there's 
only one way to arrange 0 objects. 

What about if you want to find the 
factorial of a negative number? Or one 
that’s not an integer? 


Can the result of a factorial ever be 
an odd number? 

There are only two occasions where 
this can be true, when n is 0 or when n is 1. 
In both these cases, n! = 1. 

For all other values of n, n! is even. This is 
because if n is greater than or equal to 2, 
the calculation must include the number 2. 
Any integer multiplied by 2 is even, so this 
means that n! is even if n is greater than or 
equal to 2. 

Calculating factorials for large 
numbers seems like a pain. If I want to 
find 10!, I have to multiply 10 numbers 
(10x9x8x7x6x5x4x3x2x1), and the result 
gets really big. Is there an easier way. 

Yes, many scientific and graphing 
calculators have a factorial key (typically 
labeled n!) that will perform this calculation 
for you. 


If I’m arranging n objects in a 
circle, there are (n ■ 1)! arrangements. 
What if clockwise and counterclockwise 
arrangements are considered to be the 
same? 

In this case, the number of 
arrangements is (n - 1)!/2. Calculating 
(n - 1)! gives you twice the number of 
arrangements you actually need as it gives 
you both clockwise and counterclockwise 
arrangements. Dividing by 2 gives you the 
right answer. 

What if I’m arranging objects in a 
circle and absolute position matters? 

In this case the number of 
arrangements is given by n!. In that situation, 
it’s exactly the same as arranging n objects. 


Vi+ai- 


Formulcis -for oirroihjcmchts 


Factorials only work with positive 
integers, so you can’t find the factorial of a 
negative number, or one that’s not an integer. 


|-f you warrt 七 。 -fihd 七 he humbev o-f possible 

of y\ objcd*ts, use Y\f where 

(灼一 I) % ... 5 Z % I 


One way of looking at this is that it doesn’t 
make sense to arrange bits of objects. Each 
thing you’re arranging is classed as a whole 
object. Equally, you can’t have a negative 
number of objects. 


\y\ words, multiply *to5c*thcv- dll hurwbev-s 

-fyom v\ *to I. 

|-f you arc v\ objcd*ts d tvcc\t } 

■there arc (y\ - l)f possible 
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E%eftci$e 


Paula wants to telephone the Statsville Health Club, but she has a very poor memory. She 
knows that the telephone number contains the numbers 1,2,3,4,5,6 and 7, but she can’t 
remember the order. What’s the probability of getting the right number at random? 


Paula has just been reminded that the first three numbers is some arrangement of the 
numbers 1,2 and 3, and the last four numbers is some arrangement of the numbers 4, 5, 6, 
and 7. She can’t remember the order of each set of numbers though. What’s the probability 
of getting the right telephone number now? ^_ 

ttiht: This time you heed 
"to a\rv-ahgc two gv-oups o( 
hurwbev-s. 



rfm your pencil 


The Statsville Derby is organizing a parade for the end of the season. 
10 horses are taking part, and they will parade round the race track 
in a circle. The exact horse order will be chosen at random, and if you 
guess the horse order correctly, you win a prize. 

What’s the probability that if you make a guess on the exact horse 
order, you’ll win the prize? 


you are here ► 
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exercise solutions 



Paula wants to telephone the Statsville Health Club, but she has a very poor memory. She 
knows that the telephone number contains the numbers 1,2,3,4,5,6 and 7, but she can’t 
remember the order. What’s the probability of getting the right number at random? 

Thc\rc a\rc 1 numbers so a\rc 1\ possible 1\ — — 弓 O 今 O. 

The p\robabili*ty o( i^umbcv- is *tlic\rc-fo\rc 1/ 弓 O 午 O 二 0.000% 


Paula has just been reminded that the first three numbers is some arrangement of the numbers 
1, 2 and 3, and the last four numbers is some arrangement of the numbers 4, 5, 6, and 7. She 
can’t remember the order of each set of numbers though. What’s the probability of getting the 
right telephone number now? Hint: This time you need to arrange two groups of numbers. 


IVc s*ta\rt by splitting ir\uw\bc\rs *m*to *tv/o groups, ov\t -fo\r *tWcc ir\uw\bc\rs (I, Z, 3>), 

3r\d -fo\r las 七 -fou\r (午，弓, t 7). This yves us 

Numbcv- v/ays o( I, 2-, 1> is ?>\ — 

NumbcV" o-f ways o-f 午，弓,厶,？ is 午’二午 >c3>c2 •乂 I 二 Z 午 


To -fmd 'bo'tal ^umbev- <Jc possible v/C multiply *bo 5 C*thc\r -the i^umbcv- ways 

o-f jvouf. This jives 

Total ^umbev- o( possible av-v-a^jcmc^-b is 乂午，二厶乂 2 •午二 | 午午 

The probability J ^umbev- is *thc\rc-fo\rc I/I 午午二 0.00^ 


%Jharpen your pencil 

Soikn 


The Statsville Derby are organizing a parade for the end of the season. 
10 horses are taking part, and they will parade round the race track 
in a circle. The exact horse order will be chosen at random, and if you 
guess the horse order correctly, you win a prize. 

What’s the probability that if you make a guess on the exact horse 
order, you’ll win the prize? 


10 hov-scs y/ill be m a divdlc, y/hidh mc3r\s *thc\rc 3\rc ^)! possible o\rdc\rs -fo\r hov-scs. 

°l\ — 3 厶 Z 洲 O, y/hidh *thc\rc a\rc 3 厶 Z 仰 O possible o\rdc\rs -fo\r fa\rdde- 

The probability of jucssmj do\r\rcd*tly is \/°[\ — viW\t\\ is d ^umbev- vc\ry dlosc *to O. 
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Ifs time for the wovclty race 

The Statsville Derby is unusual in that not all of the animals 
taking part in the races have to be horses. In the next race, 
three of the contenders are zebras, and they’re racing 
against three horses. 

In this race, it’s the type of animal that matters rather 
than the particular animal itself. In other words, all we’re 
interested in is which sort of animal finishes the race in 
which position. The question is, how many ways are there 
of ordering all the animals by species? 

The Derby’s offering a special bet: if you can predict 
whether a horse or zebra will finish in each place, the payout 
is 15:1. The question is, should you make this bet? 


In the last race, you had a 1/6 
probability of predicting the top 
finishers correctly. But lefs see how 
you fare in the novelty race; ifs a 
Statsville tradition. 






How would you go about solving this sort of problem? Write down your ideas 
in the space below. 


you are here ► 
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arranging by type 


Arranging by individuals is different thaw arranging by type 

So if there are three horses and three zebras in today’s 
novelty race, how can we calculate how many different 
orderings there are of horses and zebra. 



Thafs easy. There are 6 animals, so 
there are 6! ways of ordering them. 


This time we’re only interested in the type of animal, 
and not the particular animal itself. 

So far we’ve only looked at the number of ways in which we can order 
unique objects such as horses, and calculating 6! would be the correct 
result if this was what we needed on this occasion. 

This time around it’s different. We no longer care about which particular 
horse or zebra is in a particular position; we only care about what type 
of animal it is. 

As an example, if we looked at an arrangement where the three zebras 
came first and the three horses came last, we wouldn’t want to count all 
of the ways of arranging those three horses and three zebras. It doesn’t 
matter which particular zebra comes first; it’s enough to know it’s a 
zebra. 


o 



fov tWis sori 
cart about 

—U m ^ —o' 

w 七 ^ (WUa 代 about the 

avx'^al rbc 认 * 
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Wc need to arrange animals by type 


There are 6! ways of ordering the 6 animals, but the problem with this result is that it 
assumes we want to know all possible arrangements of individual horses and zebras. 


Let’s start by looking at the zebras. There are 3! ways of arranging the three 
zebras, and the result 6! includes each of these 3! arrangements. But since we’re 
not concerned about which individual zebra goes where, these arrangements are all 
the same. So, to eliminate these repetitions, we can just divide the total number of 
arrangements by 3! 



dlassm^ -bV^c as 

厂 ㈣ 二 二心 | 


Next, let’s take the horses. There are 3! ways of arranging the three horses, and the number of 
arrangements we have so far includes each of these 3! arrangements. As with the zebras, we 
divide the end result by 3! to eliminate duplicate orderings. 




TW«s -twe v/cVc dlass'm^ ^ 

so v/c aw»ac M & 

avra 呼州⑼长 W 


This means that the number of ways of arranging the 6 animals according 
to species is 


^I'togcthcv-... 



6! = 720 


Worsts art al'»kc, so >mc d^dt 
mJotr ^ . 

cL arrant ^csc l\kc a 娜 aU 


3!3! 6x6 


= 720 
36 

= 20 


In other words, the probability of betting correctly on the right order in 
which the different species finish the race is 1/20. 

Turn the page and we’ll look at this in more detail. 


There’s a 1/20 chance of 
winning, but the payout’s 
only 15: 1. I’d stay away 
from this bet. 


o 
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general formula for arranging by type 


Generalize a formula for arranging duplicates 


Imagine you need to count the number of ways in which n objects can be 
arranged. Then imagine that k of the objects are alike. 


To find the number of arrangements, start off by calculating the number of 
arrangements for the n objects as if they were all unique. Then divide by 
the number of ways in which the k objects (the ones that are alike) can be 
arranged. This gives you: 


Thc\rc a\rc y\ objects \y\ ■fco'tal. 



ob\c6ts avc 力 ■ 


al\kc 


We can take this further. 

Imagine you want to arrange n objects, where k of one type are alike, andj of 
another type are alike, too. You can find the number of possible arrangements by 
calculating: 


a\rc h objc^ m ioial 、^、 ■ Tk 吖 。“ ay d ' objects j one 

n . art alike, a^d so art k o\ type- 

“。㈣ a 代 alike, 

Cd so arc k o-f 


In general, when calculating arrangements 
that include duplicate objects, divide the total 
number of arrangements (n!) by the number of 
arrangements of each set of alike objects (j!, k!, 
and so on). 


Vi+ai- Statfstf« 


by -type 


|-f you *bo Y\ objcd*ts whevc 

j ok oy\c -type arc alike, k o-f another 
type arc alike, so arc m of ar\o*thc\r -type 
dhd so humbev- o-f a\rva^cir«eh*ts 

is ^iver\ by 


Y\' 
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ExeftctSe 


The Statsville Derby have decided to experiment with their races. They’ve decided to hold a race 
between 3 horses, 2 zebras and 5 camels, where all the animals are equally likely to finish the 
race first. 


1. How many ways are there of finishing the race if we’re interested in individual animals? 


2. How many ways are there of finishing the race if we’re just interested in the species of animal in each position? 


3. What’s the probability that all 5 camels finish the race consecutively if each animal has an equal chance of 
winning? (Assume we’re interested in the species in each position, not the individual animals themselves.) 


you are here ► 
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exercise solution 



The Statsville Derby have decided to experiment with their races. They’ve decided to hold a race 
between 3 horses, 2 zebras and 5 camels, where all the animals are equally likely to finish the 
race first. 


1. How many ways are there of finishing the race if we’re interested in individual animals? 


Thc\rc a\rc \0 animals so 


*thc\rc a\rc I Of — 


2. How many ways are there of finishing the race if we’re just interested in the species of animal in each position? 

Thcv-c av-c 1> hov-scs, Z z^bv-as av\d ^ ddmels. 

Numbcv- o-f a\r\ra^jcw\c^*ts — \0\ ^ — TKcrc avc 10 animals. 

ZlZl^l A —— ^ ihe l houses as bcihg alike, a^d the Z 

z^b\ras, ahd also -the ^ Uriels. 

二 

zo 

二 

\A^o 

二 

3. What’s the probability that all 5 camels finish the race consecutively if each animal has an equal chance of 
winning? (Assume we’re interested in the species in each position, not the individual animals themselves.) 


First J all, Irt’s -fmd -the ^umbev- o( v/ays *m ^ ddmels da 的 -finish \radc •bojctiicv-. To do 

•tlVis, y/c dlass ^ camels as oy\t single object Tha*t way, v/cVc *to keep -bojctiicv-. 

This mca^s i-f wc add ouv I jv-oup 匕 amcU *to 3 

av-v-a^jc objedts 

Number ^ — I youf d ^mcls + Z houses + Z 2 ^你 


ho\rscs Br\d Z z^bv-as, wc dd*tually y\Ctd *to 


划 M the l houses as bemg alike, a^d the Z 
72.0 «b\ras. Wt dot^i heed to divide by (or the ^ 

as wcVc douh-t'mg them as I object. 




-7Z0 

IT 

二 bO 


Thc^i *to -f md *thc p\robabili*ty o-f *this oddu\r\r'mJ, y/c jus*t ^ccd *to divide *tiiC i^umbcv- o( v/ays *thc 
-finish -bojctiicv- by dll *tiic possible ways 七 he d^imdl "types 匕扣 -finish 七 he wc ddlduld*ted above- 

The probability d all $ camels -fmishmg -bogc*tiiC\r is *thc\rc-fo\rc ^>0/2^2. — 弓 /2J 
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Why did you treat the 5 camels as 
one object in the last part of the exercise? 
Surely they’re individual camels. 

They’re individual camels, but in the 
last part of that problem we need to make 
sure we keep the camels together. To do this, 
we bundle all the camels together and treat 
them as one object. 


It seems like the number of 
arrangements for the different objects 
has a lot to do with how you group them 
into like groups. 

That’s right. Mastering arrangements 
is a skill, but a lot depends on how you think 
things through. 

The key thing is to think really carefully about 
what sort of problem you’re actually trying to 
solve and to get lots of practice. 


Are there many races where horses, 
zebras and camels all race together? 

It’s unlikely. But hey, this is Statsville, 
and the Statsville Derby runs its own events. 


Ifs time for the twenty-horse race 


The novelty race is over, with the zebras taking the lead. 
The next race is between 20 horses. 






How would you go about finding the number of ways in which you can pick 
three horses out of twenty? 
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introducing permutations 


How many ways can we fill the top three positions? 


The main race is about to begin. There are twenty horses racing, and we need to find 
the number of possible arrangements of the top three horses. This way, we can work 
out the probability of guessing the exact order correctly. 

We can work out the solution the same way we did earlier, by looking at how many 
ways there are of filling the first three positions. 

Let’s start with the first position. There are 20 horses in total, so this means there are 
20 different ways of filling the first position. Once this position has been filled, that 
leaves 19 ways of filling the second position and 18 ways of filling the third. 


TV^cvc avc %0 V^ovscs, so *tW»s 

Alim ⑼ e sedo^d, 

七 Wivd. 


义 怎减應 f : 





In this race, we’re not interested in how the rest of the positions are filled, it’s 
only the first three positions that concern us. This means that the total number 
of arrangements for the top three horses is 

20 x 19 x 18 = 6,840 





So the probability of guessing the precise order in which the top three horses 
finish the race is 1/6,840. 


That gives us the right answer, but it could get complicated if 
there were more horses, or if we wanted to fill more positions. 


We need a more concise way of solving this sort of 
problem. 

At the moment we only have three numbers to multiply together, but 
what if there were more? 


We need to generalize a formula that will allow us to find the total 
number of arrangements of a certain number of horses, drawn from a 
larger pool of horses. 
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Exammmg permutations 


So how can we rewrite the calculation in terms of factorials? 


The number of arrangements is 20 x 19 x 18. Let’s rewrite it and see 
where it gets us. 

20 x 19 x 18 = 20 x 19 x 18 x (17 x 16 x ... x 3 x 2 x _ 

(17 x 16 x ... x 3 x 2 x 1) 


_1 训 7 •tw n!/n!, 

Will st.ll yvc us i\st same a ” 減匕 


= 20 ! 
17! 


^ TWis is same c% 厂 essicm 

-terms o\ -fa£.*tov-'»aU- 


This is the same expression that we had before, but this time written in 
terms of factorials. 


The number of arrangements of 3 objects taken from 20 is called the 
number of permutations. As you’ve seen, this is calculated using 

20 ! 


This is the 

wc got 

cav-licv- 


(20-3)! 

一 2,432,902,008,176,640,000 
355,687,428,096,000 



In general, the number of permutations of r objects taken from n is the 
number of possible way in which each set of r objects can be ordered. 

• n 

It’s generally written P, where 


This is the -fco-tal 
o-f ob\cd*ts 



TW»s »s 

? os,f,o^s to 




=n! 

(n - r)! 


Permutations give 
tke total nuititer 
oi ways you can 
orcter a certain 
numter oi objects 
(r)，ctrawn from 
a larger pool oi 
objects (n). 



So if you want to know how many ways there are of ordering r objects 
taken from a pool of n, permutations are the key. 


I never said anything about the 
horse order. Just guess which 
horses are in the top three and 
ril make it worth your while... 
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introducing combinations 


What if horse order doesn't matter 


So far we’ve found the number of permutations of ordering three horses taken from a group of 
twenty. This means that we know how many exact arrangements we can make. 


I/Ve do〆 七 heed -to khow -the 


This time around, we don’t want to know how many different permutations there are. We want ov~dev~ m whidh the 

to know the number of combinations of the top three horses instead. We still want to know hovses (\y\\s\) the it’s 

how many ways there are of filling the top three positions, but this time the exact arrangement Chough "fco khow whi 匕 h houses 


doesn’t matter. 







So how can we solve this sort of problem? 


At the moment, the number of permutations includes the number of ways of 
arranging the 3 horses that are in the top three. There are 3! ways of arranging each 
set of 3 horses, so let’s divide the number of permutations by 3!. This will give us the 
number of ways in which the top three positions can be filled but without the exact 
order mattering. 

The result is 


20! 6,840 

3!17! 3! 

=1,140 


This means that there are 6,840 permutations for filling the first three places in the 
race, but if you’re not concerned about the order, there are 1,140 combinations. 
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With a 1/1,140 chance of winning here, 
the odds are way against you. But the 
payout is also huge at 1,500:1, so you can 
actually expect to come out ahead. It all 
depends on how much of a risk taker you are. 






















permutations and combinations 


Exammmg combinations 


Earlier on we found a general way of calculating permutations. Well, 
there’s a way of doing this for combinations too. 

In general, the number of combinations is the number of ways of 
choosing r objects from n, without needing to know the exact order of the 

• • • • • n 

objects. The number of combinations is written G r , where 


This is -the -to-tal 
hurubcv o-p objects. 


^ n 


c 


r 









TVi'is bit is tabulated m same way 
as a 


r! (n - r)! 

1 /ou divide by ^ ，/ ^ it > s a 


So what’s the difference between a combination and a permutation? 


Permutations 


A permutation is the number of ways in which you 
can choose objects from a pool, and where the order in 
which you choose them counts. It’s a lot more specific 
than a combination as you want to count the number 
of ways in which you fill each position. 

Permutation ： orcter matters* 

TV^csc av-c 



Combmatiohs 


A combination is the number of ways in which you 
can choose objects from a pool, without caring about 
the exact order in which you choose them. It’s a lot 
more general than a permutation as you don’t need to 
know how each position has been filled. It’s enough to 
know which objects have been chosen. 

Combination: orcter doesn’t matter. 

TKcsc av-c 
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interview with a combination 



Combination Exposed 

This week’s interview: 

Does order really matter? 


Head First： Combination, great to have you on the 
show. 

Combination： Thanks for inviting me, Head First. 

Head First： Now, let’s get straight onto business. A 
lot of people have noticed that you and Permutation 
are very similar to each other. Is that something 
you’d agree with? 

Combination： I can see why people might think 
that because we deal with very similar situations. 
We’re both very much concerned with choosing a 
certain number of objects from a pool. Having said 
that, I’d say that’s where the similarity ends. 

Head First： So what makes you different? 

Combination： Well, for starters we both have very 
different attitudes. Permutation is very concerned 
about order, and really cares about the exact order in 
which objects are picked. Not only does he want to 
select objects, he wants to arrange them too. I mean, 
come on! 

Head First： I take it you don’t? 

Combination： No way! Fm sure permutation 
shows a lot of dedication and all that, but quite 
frankly, life’s too short. As far as I’m concerned, if an 
object’s picked from the pool, then that’s all anyone 
needs to know. 

Head First: So are you better than permutation? 


Combination： Yes. Lots of music players have 
playlists where you can choose which songs you want 
to play. 

Head First: I think I see where you’re headed... 

Combination： Now, both Permutation and I 
are both interested in what’s on the playlist, but in 
different ways. I’m happy just knowing what songs 
are on it, but Permutation takes it way further. 

He doesn’t just want to know what songs are on 
the playlist, he wants to know the exact order too. 
Change the order of the songs, and it’s the same 
Combination, but a different Permutation. 

Head First: Tell me a bit about your calculation. 

Is calculating a Combination similar to how you’d 
calculate a Permutation? 

Combination： It’s similar, but there’s a slight 
difference. With Permutation, you find n!, and then 
divide it by (n-r)L My calculation is similar, except 
that you divide by an extra r!. This makes me 
generally smaller — which makes sense because I’m 
not as fussy as Permutation. 

Head First： Generally smaller? 

Combination： I ll phrase that differently. 
Permutation is never smaller than me. 

Head First： Combination, thank you for your time. 

Combination： It^ been a pleasure. 


Combination： I wouldn’t like to say that either one 
of us is better as such; it just depends which of us is 
the most appropriate for the situation. Take music 
players, for instance. 

Head First： Music players? 
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permutations and combinations 



Q；i ，ve heard of something called 
“choose.” What’s that? 

It’s another term for the combination. 
n C r . basically means “you have n objects, 
choose r，” so it’s sometimes called the 
choose function. 

Can a permutation ever be smaller 
than a combination? 

Never. To calculate a combination, you 
divide by an extra number, so the end result 
is smaller. 

The closest you get to this is when a 
permutation and combination are identical. 
This is only ever the case when you’re 
choosing 0 objects or 1. 


Which is a permutation and which 
is a combination? I get confused. 

A permutation is when you care about 
the number of possible arrangements of 
the objects you’ve chosen. A combination 
is when you don’t mind about their precise 
order; it’s enough that you’ve chosen them. 

I get confused. If I want to find the 
number of combinations of choosing r 
objects from n, do I write that n C r or r C n ? 

It’s n C r . One way of remembering this is 
that the higher of the two numbers is higher 
up in the shorthand. 

Are there other ways of writing 
this? I think I’ve seen combinations 
somewhere else, but they didn’t look like 
that. 

There are different ways of writing 
combinations. We’ve used the shorthand n C, 

r’ 

but an alternative is 


Are permutations and combinations 
really important? 

They are, particularly combinations. 
You’ll see more of these a bit later on in the 
book, so look out for when you might need 
them. 

Dealing with permutations and 
combinations looks similar to when 
you’re dealing with like objects. Is that 
right? 

It's a similar process. When you’re 
dealing with like objects, you divide the total 
number of arrangements by the number of 
ways in which you can divide the like objects. 

For permutations, it’s as though you’re 
treating all the objects you don’t choose as 
being alike, so you divide n! by (n-r)!. For 
combinations, it’s as though the objects you 
pick are alike, too. This means you divide 
the number of permutations by r!. 



V?+aL S+a+istto 

Permutations 


(;) 


Combir>atiohS 


l-f you dhoosc r objects -from a pool o-f Y\ f |-f you dhoosc r objects -from a fool h, 
七 he number of pcrmu*ta*tiohS is givch by 七 he of dombma-biohs is by 



(灼一 r )! 
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combinations exercise 





The Statsville All Stars are due to play a basketball match. There are 12 players in the roster, 
and 5 are allowed on the court at any one time. 


1. How many different arrangements are there for choosing who’s on the court at the same time? 


2. The coach classes 3 of the players as expert shooters. What’s the probability that all 3 of these players will be on 
the court at the same time, if they’re chosen at random? 
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permutations and combinations 





It’s time for you to work out some poker probabilities. See how you get on. 

A poker hand consists of 5 cards and there are 52 cards in a pack. How many different 
arrangements are there? 


A royal flush is a hand that consists of a 10, Jack, Queen, King and Ace, all of the same suit. What’s the probability of 
getting this combination of cards? Use your answer above to help you. 


Four of a kind is when you have four cards of the same denomination. Any extra card makes up the hand. What’s the 
probability of getting this combination? 


A flush is where all 5 cards belong to the same suit. What’s the probability of getting this? 
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exercise solution 



The Statsville All Stars are due to play a basketball match. There are 12 players in the roster, 
and 5 are allowed on the court at any one time. 


1. How many different arrangements are there for choosing who’s on the court at the same time? 


Thc\rc IZ playcv-s \ros*tc\r, v/e meed h> 乙 oun*t o( ways o( ^ o-f IVlc 

do^*t r\ttd *to do^sidcv- o\rdc\r m wc pi^k playc\rs, so y/c v/o\rk *tiVis ou*t usrnj 乙 ombmd*ticms. 


,Z C - 111 

5 * 

^( 11^)1 

二 iz! 


弓 n! 


二 nz 


2. The coach classes 3 of the players as expert shooters. What’s the probability that all 3 of these players will be on 
the court at the same time, if they’re chosen at random? 


Lets s*ta\rt by -f'md'mg i^umbcv- o( v/ays *m y/hidh *thv-cc shootev-s be oy\ dou\rt a*t 
SdmC "time- 

l-f *t^v-cc C 乂 pc\rt shootev-s a\rc oy\ dou\rt 3*t sdme -time, *t^is *thc\rc a\rc Z mo\rc 

le*f 七 -fo\r the o*t^c\r playc\rs. IVc r\ttd *to -f md i^umbcv- o( domb*ma*tio^s o( -f illmg -these Z glades 

■r\rom vcrwa'mmg °[ playc\rs. 

t 二 1 ! 


z! ( ， - v! 

i. 


zm 


This med^s *tiic pv-obabili*ty o-f dll ^ shoo-tev-s bemj oy\ -the dou\rt a*t same *time is 

unu 二 i/zz. 
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permutations and combinations 


It’s time for you to work out some poker probabilities. See how you get on. 

A poker hand consists of 5 cards and there are 52 cards in a pack. How many different 
arrangements are there? 

Thcv-c avc 弓 2> 匕 a\rds m a we wa^*t *to 匕 hoose 弓 . 

5Z C 二兑，二 

5 * 

午挪 

A royal flush is a hand that consists of a 10, Jack, Queen, King and Ace, all of the same suit. What’s the probability of 
getting this combination of cards? Use your answer above to help you. 

Thc\rc ； s OY\t v/ay donr\b'ma*tio)r\ -fo\r sui*t> *thc\rc av~e 午 suits. This medics 

i^umbcv- o-f v/ays o-f yttmj 3 \royal "flush is 午 . 

PfRoyal Flush) =• 午 

二 l/HWo 
二 0.00000\^ 

Four of a kind is when you have four cards of the same denomination. Any extra card makes up the hand. What’s the 
probability of getting this combination? 

Lt{!s s*ta\rt 午 da\rds o ( 七 he sdme dci^om*ma*tio^. Thc\rc a\rc 13 dc^om*ma*tioir\s *m -bo-tal, y/hidh 

a\rc li ways o-f dombmmg these 午 da\rds. Oy\U these \ da\rds have \)CCY\ t\)OStY\, *thc\rc a\rc 午 ^ da\rds 
Ic-ft This i^umbcv- o-f ways o-f *this is : 

PfFouv- o( a Kmd) 二厶 2 •午 

二 1/ 午阳 
二 0.0001^ 

A flush is where all 5 cards belong to the same suit. What’s the probability of getting this?. 

To -f'md i^urwbcv- o( possible domb*ma*tio^s ; -fmd -the ir\uw\bc\r o-f v/ays o-f dhoos'm^ a suit dhoosc ^ 

dd\rds -f\rom *t^C suit Thc\rc 3\rc 13 dd\rds *m suit This w\CBy\S *thc ir\uw\bc\r o-f domb'ma-tio^s is 

午二午 X 

5 * 

eji/ 

二午 * ⑽二 9 午《 

P(Flush) - W 

- zz/mo 

二 O.OQ\°fi 
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hooray for toupee! 


Ifs the end of the race 

The race between the twenty horses is over, and the 
overall winner is Ruby Toupee, followed by Cheeky 
Sherbet and Frisky Funboy. If you decided to bet on 
these three horses, you just won big! 



Wrmev of this year's 
Statsville Derby ： 
Ruby Toupee 



2nd place*- 
Cheek/ Sherbet 



3rd place*- 
Fri% Funboy 


In this chapter, you’ve learned how to cope with different 
arrangements, and how to quickly count the number of 
possible combinations and permutations without having to 
work out each and every possibility. 

The sort of knowledge you’ve gained gives you enormous 
probability and statistical power. Keep reading, and we’ll 
show you how to gain even greater mastery. 
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7 geometric, binomial, and pofsson distributions 

+ Keeping Things Discrete + 



Calculating probability distributions takes time. 

So far we’ve looked at how to calculate and use probability distributions, but wouldn't it be 
nice to have something easier to work with, or just quicker to calculate? In this chapter, 
we’ll show you some special probability distributions that follow very definite patterns. 
Once you know these patterns, you’ll be able to use them to calculate probabilities, 
expectations, and variances in record time. Read on, and we’ll introduce you to the 
geometric, binomial and Poisson distributions. 


this is a new chapter 
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watch out for that tree! 



Chad likes to snowboard, but he’s accident-prone. If there’s a lone 
tree on the slopes, you can guarantee it will be right in his path. Chad 
wishes he didn’t keep hitting trees and falling over; his insurance is 
costing him a fortune. 


f had s about hcv-c—just 
-follow ihc i\rcc da^e 
"to see how well his 
V"Uh wcht. V 


There’s a lot riding on Chad’s performance on the slopes: his ego, his 
success with the ski bunnies on the trail, his insurance premiums. If it’s 
likely he’ll make it down the slopes in less than 10 tries, he’s willing to 
risk embarrassment, broken bones, and a high insurance deductible to 
try out some new snowboarding tricks. 

The probability of Chad making a clear run down the slope is 0.2, and 
he’s going to keep on trying until he succeeds. After he’s made his first 
successful run down the slopes, he’s going to stop snowboarding, and 
head back to the lodge triumphantly. 
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geometric, binomial, and poisson distributions 



CV^ad is remarkably W'licirrt, 
dr\d ^ toHis'ions *m d jivci^ v*ur\ 
don’ 七 V^is prforma Wm 

•fiA-tuv-c *bv-'»als. 



It’s time to exercise your probability skills. The probability of Chad 
making a successful run down the slopes is 0.2 for any given trial 
(assume trials are independent). What's the probability he'll need 
two trials? What’s the probability he’ll make a successful run down 
the slope in one or two trials? Remember, when he’s had his first 
successful run, he’s going to stop. 

Wiht Y® u My "to dv*dv/ a 
Probability hrtt io help visualize 
the pv-oblcrw. 
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sharpen solution 


^harpen your pencil 

Sobtion 


It’s time to exercise your probability skills. The probability of Chad 
making a successful run down the slopes is 0.2 for any given trial 
(assume trials are independent). What’s the probability he’ll need 
two trials? What’s the probability he’ll make a successful run down 
the slope in one or two trials? Remember, when he's had his first 
successful run, he’s going to stop.. 

d p\robabili*ty brtt -fo\r 七 he *fi\rs 七 *two *tvials, ds -these d\re dll needed *to y/o\rk ou 七七 he 

p\robabili*tics. 



Tv-ial I 


Suddcss 




Fail 


㈣ 一 
㈣ 吵、仆 



Trial Z 


Suddcss 


Fail 


|-f y/c say 〉（ is 七 he o-f *brials needed *to yt dovm *tiic slopes, 

P()< 二 I ) 二 PfSud^css *m *t\rial I) 

二 O.Z 

p()< 二 2J 二 P(Suddcss \Y\ *t\rial Z n Failuv-c *m *tvial I) 

二 ox % o.e 

二 al^> 


?ty < Z) - ?ty - I) + P()< - V 

二 ox\o\^> 


Wc t^y\ add "these 
p^robabili-tics be^duse 

士 —Yc ihdcpchdchi 
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geometric, binomial, and poisson distributions 


Wc need to find Chad’s probability distribution 

So far you’ve found the probability that Chad will need fewer than three 
attempts to make it down the slope. But what if you needed to look at the 
probability of him needing fewer than 10 attempts (for insurance reasons), 
or even 20 or 100? 

Rather than work out the probabilities from scratch every time, it would 
be useful if we could use a probability distribution. To do this, we need 
to work out the probability for every single possible number of attempts 
Chad needs to get down the slope. 〆 



0 O 


Hang on. If we have to work 
out every single probability, well 
be here forever. 


There’s a problem because the number of possibilities 
is neverending. 

Chad will continue with his attempts to make it down the slope until he is 
successful. This could take him 1 attempt, 10 attempts, 100 attempts, or 
even 1,000 attempts. There are no guarantees about exactly when Chad 
will first successfully make it down the slopes. 


So you expect me to come up 
with the probability distribution 
of something thafs neverending? Is 
that your idea of a joke? 


Even though it’s neverending, there’s still a way of 
figuring out this type of probability distribution. 

This is actually a special kind of probability distribution, with special 
properties that makes it easy to calculate probabilities, along with the 
expectation and variance. 

Let’s see if we can figure it out. 
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chad’s probability tree 


There's a pattern to this probability distribution 

Let’s define the variable X to be the number of trials needed for 
Chad to make a successful run down the slope. Chad only needs 
to make one successful run, and then he’ll stop. 

Let’s start off by examining the first four trials so that we can 
calculate probabilities for the first four values of X. By doing this, 
we can see if there’s some sort of pattern that will help us to easily 
work out the probabilities of other values. 



p()< 二 I) is 如 

Trial 1 七 7 〆 Oad 

b ㈣ sUCtssU ^ 

Success tnaU ’ 

Trial 2 


Success 


Fail 



Fail 




p()( ^ 

CV^ad b— su^cssUl 

. 丄 l buo tv-«als, but 



Success 


Trial 4 


Fail 


Success 



Here are the probabilities for the first four values 
of X. 


X 

P(X = x) 

1 

0.2 

2 

0.8 x 0.2 = 0.16 

3 

0.8 x 0.8 x 0.2 = 0.128 

4 

0.8 x 0.8 x 0.8 x 0.2 = 0.1024 


衫 ? .oV>aVA^ a ^ 


police probability is 
Composed by rwultiplymg 
di-r-Pcv-Cht powev-s o( 0.0 and 

O Z -fcoythev-. 


### 
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geometric, binomial, and poisson distributions 



E%eftci$e 


Here’s a table containing the the probabilities of X for different values. Complete the table, filling 
out the probability that there will be x number of trials, and indicating what the power of 0.8 and 
0.2 are in each case (the number of times.0.8 and 0.2 appear in P(X = x)). 


X 

P(X = x) 

Power of 0.8 

Power of 0.2 

1 

0.2 

0 

1 

2 

0.8 x 0.2 

1 

1 

3 

0.8 2 x 0.2 

2 


4 




5 




r 






卞 p^ti^ulav- value o( x bu-t 

WC ^ hot Sa y^9 oh C . Ca h you 
9u«s what the pvobabiliiy will be ih 
tcv-ms o-r v? 


心 e 如 _ 一 “ ， tal 〜 laW 
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exercise solution 


Here’s a table containing the the probabilities of X for different values. Complete the table, filling 
out the probability that there will be x number of trials, and indicating what the power of 0.8 and 
0.2 are in each case (the number of times.0.8 and 0.2 appear in P(X = x)). 


X 

P(X = x) 

Power of 0.8 

Power of 0.2 

1 

0.2 

0 

1 

2 

0.8 x 0.2 

1 

1 

3 

0.8 2 x 0.2 

2 

1 

4 

% OX 


1 

5 

0 .^ % o.z 

午 

1 

r 

o.e ^ y. ox 

\r — 1 

1 


Fo\r )( 二午 , Chdd -fails *t^\rcc times suddeeds W\s -fouv-th attempt 

P()< 二午） is -thcvc-Povc 0.0 % O.0 % 0.9 % 0 2 ., as -the p\robabili*ty of oy\ d vuh is 0 0 dhd 

p\robabili*ty suddess is O Z. 

For )< 二 S Chad -fails oy\ W\s -fiv-s-t -fouv suddeeds oy\ W\s -f i-f-th. This 

P()< =1 =1 0.0 >c 0.0 O.Q 0.0 x. o.z. 

So v/ha 七 i*f P ()( 二 \r)? Po\r Chad *to be suddess-ful oy\ W\s attempt he mus*t have ^diledl m W\s -fi\rs-t (\r—I) 
attempts, bc-fo\rc suddeedmg m his v^*tiv Thc\rc-fo\rc 

P(X — \r) — O.Q % O.G x. ... O.Q % 0 . 2 .) y/hidli med^s m ou\r c^fvcssioh, O.Q is *bdkeir> *fco *thc (\r—I )*th fov/cv-. 
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First you say P(X = x), 
then you say P(X = r). I wish 
you'd make your mind up. 


They refer to two different things. 

When we use P(X = x), we’re using it to demonstrate x taking on any value 
in the probability distribution. In the table above, we show various values 
of x, and we calculate the probability of getting each of these values. 

When we use P(X = r), x takes on the particular value r. We’re looking 
for the probability of getting this specific value. It’s just that we haven’t 
specified what the value of r is so that we can come up with a generalized 
calculation for the probability. 

It’s a bit like saying that x can take on any value, including the fixed value r. 

















geometric, binomial, and poisson distributions 


The probability distribution can be represented algebraically 


As you can see, the probabilities of Chad’s snowboarding trials follow a 
particular pattern. Each probability consists of multiples of 0.8 and 0.2. 
You can quickly work out the probabilities for any value r by using: 

P(X = r) = 0.8 r4 x 0.2 

In other words, if you want to find P(X = 100), you don’t have to draw an 
enormous probability tree to work out the probability, or think your way 
through exactly what happens in every trial. Instead, you can use: 

P(X= 100) = 0.8" x 0.2 



o 


We can generalize this even further. If the probability of success in a trial 
is represented by p and the probability of failure is 7 -p, which we’ll call 
g, we can work out any probability of this nature by using: 


(v - I) -failuv-cs I suttess. 

P(X = r) = q r - 1 p 广 I” ou “asc, ? 二似 “ 


、二 o .®. 


This formula is called the geometric distribution. 


q 


thereiare no ^ 

Dumb Questi9ns 


What’s the point in generalizing 
this? It’s just one particular problem 
we’re dealing with. 

We’re generalizing it so that we can 
apply the results to other similar problems. If 
we can generalize the results for this kind of 
problem, it will be quicker to use it for other 
similar situations in the future. 

You said we needed to find an 
expression for P(X = r). What’s r? 

P(X = r) means “the probability that X 
is equal to value r," where r is the number of 
trials we need to get the first success. 

If you wanted to find, say, P(X = 20), you 
could substitute rfor 20. This would give you 
a quick way of finding the probability. 


Why is it the letter r? Why not some 
other letter? 

We used the letter r so that we could 
generalize the result for any particular 
number. We could have used practically any 
other letter, but using r is common. 

How can we have a probability 
distribution if the number of possibilities 
is endless? 

We don’t have to specify a probability 
distribution by physically listing the 
probability of every possible outcome. The 
key thing is that we need a way of describing 
every possibility, which we can do with a 
formula for computing the probability. 


Wouldn’t Chad’s snowboarding 
skills eventually improve? Is it realistic to 
say the probability of success is 0.2 for 
every trial? 

That may be a fair assumption. But 
in this problem, Chad is truly hapless when 
it comes to snowboarding, and we have to 
assume that his skills won’t improve—which 
means his probability of success on the 
slopes will follow the geometric distribution. 
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geometric distribution in depth 


The geometric distribution has a distinctive shape. 

P(X = r) is at its highest when r = 1, and it gets lower and 
lower as r increases. Notice that the probability of getting 
a success is highest for the first trial. This means that the 

mode of any geometric distribution is always 1, 

as this is the value with the highest probability. 

This may sound counterintuitive, but it’s most likely that 
only one attempt will be needed for a successful outcome. 








X 



Ge^tnefric DiSfri^utian Up Cl^se 


We said that Chad’s snowboarding exploits are an example of the geometric 
distribution. The geometric distribution covers situations where: 


o You run a series of independent trials. 

❺ 


❺ 


There can be either a success or failure for each trial, and the 
probability of success is the same for each trial. 

The main thing you’re interested in is how many trials are needed in 
order to get the first successful outcome. 


So if you have a situation that matches this set of criteria, you can use the 
geometric distribution to help you take a few shortcuts. The important thing 
to be aware of is that we use the word “success” to mean that the event 
we’re interested in happens. If we’re looking for an event that has negative 
connotations, in statistical terms it’s still counted as a success. 

Let’s use the variable X to represent the number of trials needed to 
get the first successful outcome — in other words, the number of trials 
needed for the event we’re interested in to happen. 

To find the probability of X taking a particular value r, you can get a quick 
result by using: 

P(X = r) = p q r1 

where p is the probability of success, and q = 1 — p, the probability of failure. 
In other words, to get a success on the rth attempt, there must first have been 
fr— 1) failures. 


(x = x)d 
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The geometric distribution also works with inequalities 

As well as finding exact probabilities for the geometric distribution, there’s also 
a quick way of finding probabilities that deal with inequalities. 

Let’s start with P(X > r). 

P(X > r) is the probability that more than r trials will be needed in order to get 
the first successful outcome. In order for more than r trials to be needed, this 
means that the first r trials must have ended in failure. This means that you 
find the probability by multiplying the probability of failure together r times. 

P(X > ㈣ f 一一言身 u 

t Zd l ^ ^ ^ 

ZuM >st -st be 


We can use this to find P(X < r), the probability that r or fewer trials are 
needed in order for there to be a successful outcome. 


N^dS 


w\OVC 


v- 


If we add together P(X < r) and P(X > r), the total must be 1. This means that 
P(X < r) + P(X > r) = 1 


or 


广 T W, S P(>< 以广 & 

P(X < r) = 1 - P(X >r) ^ p()< < \r ) 二 I - W 


This gives us 


P(X < r) = 1 - q r 


FVom above, wc kr\ow 七 P(y > 矿 ). 二 < 
so y/e subs*bi*bu*tc m °( -fov P()( > vO *to 
yb 七 his -fovmula. 


If a variable X follows a geometric distribution where the probability of 
success in a trial is p, this can be written as 

TK …一 ——:A: & 


X ~ Geo(p) 



I his is a ^ :， n 


m 



Tm getting bruised! How 
many attempts do you 
expect me to have to make 
before I make it down the 
\ slope OK? 


O 0 










geometric expectation 


The pattern of expectations for the geometric distribution 


So far we’ve found probabilities for the number of attempts Chad 
needs to make before successfully makes it down the slope, but what 
if we want to find the expectation and variance? If we know the 
expectation, for instance, we’ll be able to say how many attempts we 
expect Chad to make before he’s successful. 

Gan you remember how we found expectations earlier in the book? 
We find E(X) by calculating 2xP(X =x). The probabilities in this 
case go on forever, but let’s start by working out the first few values to 
see if there’s some sort of pattern. 

Here are the first few values of x, where X 〜 Geo(0.2) 


/\s a avc^ay 

value you e 平冼 *to a b»t like 七 k 

^ Ut U ? v-obab*»l*itY a*»sV»Ut»o^. 
is a mcasuv-c mutii you 

"tii'is *to vav-'ics by. 


X 

P(X = x) 

xP(X = x) 

xP(X < x) 

1 

0.2 

0.2 

0.2 

2 

0.8x0.2 = 0.16 

0.32 

0.52 

3 

0.8 2 x 0.2 = 0.128 

0.384 

0.904 

4 

0.8 3 x 0.2 = 0.1024 

0.4096 

1.3136 

5 

0.8 4 x 0.2 = 0.081 92 

0.4096 

1.7232 

6 

0.8 5 x 0.2 = 0.065536 

0.393216 

2.116416 

7 

0.8 6 x 0.2 = 0.0524288 

0.3670016 

2.4834176 

8 

0.8 7 x 0.2 = 0.04194304 

0.33554432 

2.81894608 




TWis is 


to-tal 


Gan you see what happens to the values of xP(X = x)? 

The values of xP(X = x) start off small, and then they get larger until x = 5. When 
x is larger than 5, the values start decreasing again, and keep on decreasing as x 
gets larger. As x gets larger, xP(X = x) becomes smaller and smaller until it makes 
virtually no difference to the running total. 

We can see this more clearly if we chart the cumulative total of xP(X = x): 
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Expectation is 1/p 


Drawing the chart for the running total of xP(X = x) shows you that as x gets 
larger, the running total gets closer and closer to a particular value, 5. In fact, the 
running total of xP(X = x) for an infinite number of trials is 5 itself. This means 
that 

E(X) = 5 

This makes intuitive sense. The probability of a successful outcome is 0.2. This is a 
bit like saying that 1 in 5 attempts tend to be successful, so we can expect Chad to 
make 5 attempts before he is successful. 

We can generalize this for any value p. If X 〜 Geo(p) then 


I can expect to 
make it down in 5 
tries? Not bad! 


E(X) 


P 



TKc c 平山 W lS 丨 dWidcd by 

七 he pv-obak'»li*tY success. 


We’re not just limited to finding the expectation of the geometric distribution, 
we can find the variance too. 


i^harpen your pencil 



Let’s see if we can find an expression for the variance of the 
geometric distribution in the same way that we did for the 
expectation. Complete the table below. What do you notice? 


X 

P(X = x) 

x 2 P(X = x) 

x 2 P(X < x) 

1 

0.2 



2 

0.8x0.2 = 0.16 



3 

0.8 2 x 0.2 = 0.128 



4 

0.8 3 x 0.2 = 0.1024 



5 

0.8 4 x 0.2 = 0.08192 



6 

0.8 5 x 0.2 = 0.065536 



7 

0.8 6 x 0.2 = 0.0524288 



8 

0.8 7 x 0.2 = 0.04194304 



9 

0.8 8 x 0.2 = 0.033554432 



10 

0.8 9 x 0.2 = 0.0268435456 




^ k 7 e(>< z ) - 防 ) <). 
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Let’s see if we can find an expression for the variance of the 
geometric distribution in the same way that we did for the 
expectation. Complete the table below. What do you notice? 


X 

P(X = x) 

x 2 P(X = x) 

x 2 P(X < x) 

1 

0.2 

OX 

O.l 

2 

0.8x0.2 = 0.16 

o.H 

O .0 午 

3 

0.8 2 x 0.2 = 0.128 

脱 

i.m 

4 

0.8 3 x 0.2 = 0.1024 



5 

0.8 4 x 0.2 = 0.08192 

Z .0 午 0 


6 

0.8 5 x 0.2 = 0.065536 



7 

0.8 6 x 0.2 = 0.0524288 

Z^OllZ 


8 

0.8 7 x 0.2 = 0.04194304 

2 •.厶 0 午 3 弓午弓厶 

\IX°[\Q^ 

9 

0.8 8 x 0.2 = 0.033554432 


l^.ooevol^ 

10 

0.8 9 x 0.2 = 0.0268435456 

Z . 的午3弓午弓厶 



This -time >c z P()< — uirrtil \rcadhcs lO. x. \rcadhcs lO i*t s*ta\rts *to Jo dovm 


sharpen solution 


(^i^rpen your pencil 

Solution 



I get it, so x 2 P(X = x) gets larger 
for a while, but after that, it gets 
smaller and smaller as x gets larger 
and larger. 


That’s right. 

x 2 P(X = x) gets larger and larger up until a certain point, and then it starts 
decreasing again. Eventually it becomes very close to 0. 
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Finding the variance for our distribution 

So how does this help us find the variance of the number of trials it takes 
Chad to make a successful run down the slopes? 

We find the variance of a probability distribution by calculating 

Var(X) = E(X 2 ) - E 2 (X) 

This means that we calculate 2x 2 P(X = x), and then subtract E(X) squared. 
By graphing the resulting values against the values of x, you can see the 
pattern of Var(X) as x increases. Here’s the graph of x 2 P(X < x) - E 2 (X) 



As x gets larger, the value of x 2 P(X < x) - E 2 (X) gets closer and closer to a 
particular value, this time 20. 

As with the expectation, we can generalize this. If X 〜 Geo(p) then 

q 

Var(X)= —— 

P 2 
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geometric distribution cheat sheet 


A quick guide to the geometric distribution 

Here’s a quick summary of everything you could possibly need to know about the Geometric distribution 


Whew do I use it? 

Use the Geometric distribution if you’re running independent trials, each one can have a success or failure, and 
you’re interested in how many trials are needed to get the first successful outcome 


How do I calculate probabilities? 


Use the following handy formulae, p is the probability of success in a trial, q 
trials needed in order to get the first successful outcome. We say X 〜 Geo(p). 


P(X = r) = pq r 

The p\robabili-ty o-f the -fiv-s-t 
suMess bc'mj ih the v’th br\a\ 


P(X > r) = q r 

The pv"obabili*tY youll Y\ttd mov-c •than 
v *t\rials *to yt youv su^^css 


p, and X is the number of 


P(X < r) = 1 - q r 

TV^c pobaWvb/ you II v- iv-ials 
0 \r less -bo yt 70UV sUCtss 


What about the expectation and variance? 


Just use the following 


E(X) = 1/p 


Var(X) = q/p 2 


th&reictre no o 

Dumb Questions 


Can I trust these formulae? Can 
I use them any time I need to find 
probabilities and expectations? 

You can use these shortcuts whenever 
you’re dealing with the geometric distribution, 
as they’re shortcuts for that probability 
distribution. If you’re dealing with a situation 
that can’t be modelled by the geometric 
distribution, don’t use these shortcuts. 

Remember, the geometric distribution is 
used for situations where you’re running 
independent trials (so the probability stays 
the same for each one), each trial ends in 
either success or failure, and the thing you’re 
interested in is how many trials are needed 
to get the first successful outcome. 


What about if my circumstances 
are different? What if I have a fixed 
number of trials and I want to find the 
number of successful outcomes? 

You can’t use the geometric 
distribution to model this sort of situation, but 
don’t worry, there are other methods. 

Q/ Do I have to learn all of these 
shortcuts? 

If you have to deal with the geometric 
distribution, knowing the formulae will 
save you a lot of time. If you’re sitting fora 
statistics exam, check whether your exam 
syllabus covers it. 


Why does the distribution use the 
letters p and q? 

The letter p stands for probability. In 
this case, it's the probability of getting a 
successful outcome in one trial. 

The letter q is often used in statistics to 
represent 1 - p, or p 1 . You’ll see quite a lot 
of it through the rest of this chapter and the 
rest of the book. 
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BE th^ Wei 



The probability that another 
snowboarder will m^ke it down 
tire slope witiiout falling over is 
0.4. Your job is to play like you’re 
the snowboarder and out tire 


following probabilities for 
your slope success. 


1. The probability that you will be successful on your second attempt, while failing on your first. 


2. The probability that you will be successful in 4 attempts or fewer. 


3. The probability that you will need more than 4 attempts to be successful. 


4. The number of attempts you expect you'll need to make before being successful. 


5. The variance of the number of attempts. 
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be the snowboarder solution 


BE th^ Wei 知! uffeti 



The probability that another 
snowboarder will m^ke it down 
tire slope witiiout falling over is 
0 . 4 . Your job is to play like you’re 
the snowboarder and wort out tire 
following probabilities for 
your slope success. 


Let's use )< 七 o(0. 午 ), >mKcvc >< 
is r\uw»bcv" o-f *tv"ials needed by 
七 iVis sc£>oy\d sr\oy/bo3vdcv *to m3kc 3 

dean ru 於 dov/h -bKc slope- 


1. The probability that you will be successful on your second attempt, while failing on your first. 

P()< 二 2J 二 p >c ' 

— oA % oi 

二 0.2 •午 


2. The probability that you will be successful in 4 attempts or fewer. 
P(>< s 午)二 I - '午 

-I - O .妙 

二 I - 0.1 
二 0.070 午 


3. The probability that you will need more than 4 attempts to be successful. 
P(>< > 午)二 '午 

二 q ^ ^- Oy you dould have -fou^d this by 

usmj P()<> 午）二 I - P(X S 午） 

二 Q\V\^> - I - o.eio^ - 0.\V)i 


4. The number of attempts you expect you'll need to make before being successful. 
E()<) — I/p 

二 \/oA 

二 z . 弓 

4. The variance of the number of attempts. 

VarCy.) - '/p z 

二 o.WoA 1 

二 O 上 /O.W 
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o ° 

YouVg mastered the geometric 
distribution 

Thanks to your skills with the geometric distribution, Chad not only 
knows the probability of him making a clear run down the slopes after 
any number of tries, but also how many times he can expect it to take to 
get down the hill successfully, and how much variability there is. 

With an expectation of 5 tries to make it down the slopes, and a 
variance of 20, he feels much more confident he can impress the ladies 
without serious bodily harm. 

Now let’s move on to... 

La 赠 Anp 

Th?s CrtaH 软 

lb Vov 

An EVCff?N(J (NStaLLH^N+ 

of Favo^lfe 

抑 Slow ： 

Who vIqhH To VtiH A 

Sw?v^l Cna?^Z 





V 
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Hello, and welcome to Who Wants To 
Win A Swivel Chair, Statvilles favorite 
quiz show. We’ve got some fiendishly 
difficult questions on the show tonight. 
I hope you’re feeling lucky. 





Weve got some great questions for you today, 
so let's get started. In Round One I*m going to ask you 
three questions, and for each question there are four possible 
answers. You can quit now and walk away with the consolation 
prize, but if you play on and beat your competitors, you’ll move 
on to the next round and be one step closer to winning a swivel 
chair. The title of Round One is ''All About Me." Good luck! 



、 - 

^^arpen your pencil 


Here are the questions for Round One. The 
questions are all about the game show host. 
Put a check mark next to the correct answer. 



What’s his favorite color? 




Green 


W 



Blue 


D: Yellow 




W 







In what month is his birthday? 




A: January 


: March 



: February 




W 


w 


w 



M 




3. What do people like most about h 




Good looks 




C: Sense of Humor 


W 



Charm 


D: Intelligence 




w 


tJiereiare no ^ 

Dumb Questi9ns 


What’s a quiz show doing in the middle of my chapter? I 
thought we were talking about probability distributions. 

We still are. This situation is ideal for another sort of probability 
distribution. Keep reading and everything will become clear. 


I don’t know the answers to these questions. What should 

Ido? 

If you don’t know the answers you’ll have to answer them at 
random. Give it your best shot - you might win a swivel chair. 
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Should you play, or walk away? 


It’s unlikely you’ll know the game show host well enough to answer these 
questions, so let’s see if we can find the probability distribution for the number 
of questions you’ll get correct if you choose answers at random. That should 
help you decide whether or not to play on. 

Here’s a probability tree for the three questions: 


Question 3 


Question 1 



Correct 


Incorrect 



Question 2 

Correct 


Incorrect 



Correct 


Incorrect 


0.25 



Correct 

Incorrect 

Correct 

Incorrect 


0.25 



Correct 

Incorrect 

Correct 

Incorrect 


— your pend 


What are the probabilities for this problem? What sort of pattern 
can you see? We’re using X to represent the number of questions 
you get correct out of three. 


X 

P(X = x) 

Power of 0.75 

Power of 0.25 

0 

0.75 3 

3 

0 

1 




2 




3 
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^Jhdrpen your pencil 

Sobtion 


What are the probabilities for this problem? What sort of pattern 
can you see? We’re using X to represent the number of questions 
you get correct out of three. 



X 

P(X = x) 

Power of 0.75 

Power of 0.25 

0 

0.75 3 二 AVL 

3 

0 

1 

?> % OH^ 1 % O.Vy —. 午 Z2> 

Z 

1 

2 

z x. on^ % o.z 弓 2 ■二 .1 午 i 

1 

Z 

3 

QVt 3 - .01^ 

O 



Theme av-c Z 
di-Pfcv-ch-t ways 
you G ”纤 

Ohc 

^cs-tioh Hjhi ahd 

all W have 
a probability o*p 

° x 0.25. 


0.25 


Questi 


ion 


0.25 


Question 2 


Correct 


Correct 



0.25 


Question 3 

Correct 


0.25 


0.75 




0.25 



Incorrect 

Correct 


Incorrect 


Correct 


Incorrect 


Correct 


Incorrect 




Think back to when you looked at permutations 
and combinations in Chapter 6. How do you think 
they might help you with this sort of problem? 
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&eneralizmg the probability for three questions 


So far we’ve looked at the probability distribution of X, the number of 
questions we answer correctly out of three. 


Just as with the geometric distribution, there seems to be a pattern in 
the way the probabilities are formed. Each probability contains different 
powers of 0.75 and 0.25. As x increases, the power of 0.75 decreases 
while the power of 0.25 increases. 


In general, P(X = r) is given by: 


V is i^urmbcv - o( 

<^uCs*tioir>s y/c v *’ 吵七 



P(X 


TV,e —WM W 十 — 。 “咖 
r) = ? x ol25 r x 0.75^ r Th 价 ⑽ ；^ ucst,0 ^ 


T 

this? 


1 


The Probability o( 3 eUi， 9 a yd 


wirohg 


In other words, to find the probability of getting exactly r questions right, 
we calculate 0.25 r , multiply it by 0.75 3 " r , and then multiply the whole lot by 
some number. But what? 


Whafs the missing number? 


For each probability, we need to answer a certain number of questions 
correctly, and there are different ways of achieving this. As an example, 
there are three different ways of answering exactly one question correctly 
out of three questions. Another way of looking at this is that there are 3 
different combinations. 

Just to remind you, a combination n C r is the number of ways of choosing r 
objects from n, without needing to know the exact order. This is exactly the 
situation we have here. We need to choose r correct questions from 3. 

This means that the probability of getting r questions correct out of 3 is 
given by 




P(X = r) = 3 C r x 0.25 r x 0.75 3 


So, by this formula, the probability of getting 1 question 
correct is: 


-1 


P(X = r) = x 0.25 x 0.75' 

= 3!/(3-l)! x 0.25 x 0.5625 
= 6/2 x 0.0625 x 0.75 


:0.422 


This is the same \rcsult wc go-t 
ouv * oh the p\rcvious 
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Solution 


Here are the questions for Round One. The questions are all 
about the game show host. 





.What’s his favorite color? 


w 


CB 

: Red 


: Green 

_^ 




• In what month is his birthday? 



A: Jan 
March 


uary 







B: February 






3. What do people like most about h 



Cl 


: Good looks 



C: Sense of Humor 




W 





B: Charm 


Si 


D: Intelligence 


I 
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Looks like you tied 
with another contender. 
Congratulations, you’re 
through to the next round. 
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Round Two of Who Wants To Win A Swivel Chair 
is called ''More About Me." This time Til ask you five 
questions. As before, there are four possible answers 
to each question. Do you want to play on? 


your pencil 


Here are the questions for Round Two. The 
questions are all about the game show host. 



Cl 


Mary 


Maggie 


.What was the name of his first girlfriend? 


Marie 







2. What would be an ideal gift for h 




A statue 


A horse 




W 




dog 


: A hovercraft 




w 


M 


3. What is his greatest achievement? 



A: Hosting a quiz show 


W 


Raising $1000 for the seal sanctuary 





B: Winning Mr Statsville 2008 


: Releasing an album 


M 




4. What is his secret ambition? 


Cl 


: To launch a range of sports equipment 


: To launch his own range of menswear 



d 


To release an exercise DVD 


To have his own hair care range 




5. In what year was he abducted by aliens? 


I 

> 

I 

I 

I 






It looks like these questions are just as obscure as the ones in the previous round, so 
you’ll have to answer questions at random again. 

Let’s see if we can work out the probability distribution for this new set of questions. 
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Lcfs generalize the probability further 

So far you’ve seen that the probability of getting r questions correct out 
of 3 is given by 


P(X = r) = 3 C r x 0.25 r x 0.75 3 


where the probability of answering a question correctly is 0.25, and the 
probability of answering incorrectly is 0.75. 

The next round of Who Wants To Win A Swivel Chair has 5 questions 
instead of 3. Rather than rework this probability for 5 questions, let’s 
rework it for n questions instead. That way we’ll be able to use the same 
formula for every round of Who Wants To Win A Swivel Chair. 



So what’s the formula for the probability of getting r questions right out of 
n? It’s actually 


P(X = r) = n C r x 0.25 r x 0-75 n 



What if the probability of getting a 
question right changes? I wonder if we 
can generalize this further. 


Yes, we can generalize this further. 

Imagine the probability of getting a question right is given by p, and 
the probability of getting a question wrong is given by 1 — p, or q. The 
probability of getting r questions right out of n is given by 

P(X = r) = n C r x p r x q n ■ r 


This sort of problem is called the binomial distribution. Let’s take a 
closer look. 
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x-B(n, p) 


The exact shape of the binomial distribution varies 
according to the values of n and p. The closer to 0.5 p is, 
the more symmetrical the shape becomes. In general it is 
skewed to the right when p is below 0.5, and skewed to the 
left when p is greater than 0.5. 


、 \a\ 




\za\^ s 





DiSfri^utian Up Cl^se 


Guessing the answers to the questions on Who Wants To Win A Swivel 
Chair is an example of the binomial distribution. The binomial 
distribution covers situations where 



o 

❺ 

❺ 


You’re running a series of independent trials. 






There can be either a success or failure for each trial, and the 
probability of success is the same for each trial. 

— TWis is 

There are a finite number of trials. 


Just like the geometric distribution, you’re running a series of independent 
trials, and each one can result in success or failure. The difference is that 
this time you’re interested in the number of successes. 

Let’s use the variable X to represent the number of successful 
outcomes out of n trials. To find the probability there are r successes, 
use: 


P(X = r) = n C r p r q n 


where 


c = 

r 


n! 


r! (n ■ r)! 


p is the probability of a successful outcome in each trial, and n is the number 
of trials. We can write this as 


A - 

7= x)d 
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Whafs the expectation awd variance? 

So far we’ve looked at how to use the binomial distribution to find basic 
probabilities, which allows us to calculate the probability of getting a certain 
number of questions correct. But how many questions can we actually expect to 
get right if we choose the answers at random? That will help you better decide 
whether we should answer the next round of questions. 


Let’s see if we can find a general expression for the expectation and variance. 
We’ll start by working out the expectation and variance for a single trial, and then 
see if we can extend it to n independent trials. 

Ufs look at owe trial 

Suppose we conduct just one trial. Each trial can only result in success or 
failure, so in one trial, it’s possible to have 0 or 1 successes. If X 〜 B(l, p), 
the probability of 1 success is p, and the probability of 0 successes is q. 


>< " Bd, ?)• 


X 

0 

i 

P(X = x) 

q 

p 


We can use this to find the expectation and variance of X. Let’s start with the expectation. 

E(X) = Oq + lp 
=P 


Var(X) = E(X 2 ) _ E(X) 2 
=(Oq + lp) - p 2 

=P - P 2 
= P(! -P) 

=pq 


^ 抑 ) 二？， so 




. 2 - 




So for a single trial, E(X) = p and Var(X) = pq. But what if there are n trials? 





In general, what happens to the expectation and variance when there are n 
independent observations? How can this help us now? 
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aa] Puzzjc 


Let’s see if you can derive the expectation 
and variance for Y - B(n, p). Your job 
is to take elements from the pool 
and place them into the blank lines 
of the calculations. You may not 
use the same element more than 
once, and you won’t need to use all 
the elements. 


Hint: Each X. is a separate trial. E(X.) = p, 
and Var(X.) = pq 

You need to find the expectation and variance 
of n independent trials. 


E(X) = E(X 1 ) + E(X 2 ) + ... + E(X) 
=. E(X) 


Var(X) = Va^) + Var(X 2 ) + ... + Var(X n ) 
=.Var(X) 



npq 


Note: each element in 
the pool can only be 
used once! 
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Paa] puzzjc ^alufiaii 

Let’s see if you can derive the expectation 
and variance for Y - B(n, p). Your job 
is to take elements from the pool 
and place them into the blank lines 
of the calculations. You may not 
use the same element more than 
once, and you won’t need to use all 
the elements. 


Hint: Each X. is a separate trial. E(X.) = p, 
and Var(X.) = pq 

You need to find the expectation and variance 
of n independent trials. 


E(X) = E(X 1 ) + E(X 2 ) + ... + E(X) 


^Smdc the tvials arc B()<i) — E()< z ) — B()<^) f 

dhd So Oh. 


np 


|*P X ~ p), theh 

B()<) - 



Var()() 
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Pmomial expectation and variance 

Let’s summarize what we just did. First of all, we took at one trial, where 
the probability of success is and where the distribution is binomial. 

Using this, we found the expectation and variance of a single trial. 

We then considered n independent trials, and used shortcuts to find the 
expectation and variance of n trials. We found that if X 〜 B(n, p), 

E(X) = np ^__ TV ^ CSC ^ 

Var(X) = npq ^ 

This is useful to know as it gives us a quick way of finding the expectation 
and variance of any probability distribution, without us having to work out 
lots of individual probabilities. 


tJiereicire no ^ 

Dumb Qjuesti9ns 


The geometric distribution and the 
binomial distribution seem similar. What’s 
the difference between them? Which one 
should I use when? 

The geometric and binomial 
distributions do have some things 
in common. Both of them deal with 
independent trials, and each trial can result 
in success or failure. The difference between 
them lies in what you actually need to find 
out, and this dictates which probability 
distribution you need to use. 

If you have a fixed number of trials and you 
want to know the probability of getting a 
certain number of successes, you need to 
use the binomial distribution. You can also 
use this to find out how many successes you 
can expect to have in your n trials. 

If you’re interested in how many trials you’ll 
need before you have your first success, 
then you need to use the geometric 
distribution instead. 


The geometric distribution has a 
mode. Does the binomial distribution? 

Yes, it does. The mode of a probability 
distribution is the value with the highest 
probability. If p is 0.5 and n is even, the 
mode is np. If p is 0.5 and n is odd it has two 
modes, the two values either side of np. For 
other values of n and p, finding the mode is 
a matter of trial and error, but it’s generally 
fairly close to np. 

So for both the geometric and the 
binomial distributions you run a series 
of trials. Does the probability of success 
have to be the same for each trial? 

In order for the geometric or binomial 
distribution to be applicable, the probability 
of success in each trial must be the same. 

If it’s not, then neither the geometric nor 
binomial distribution is appropriate. 


I’ve tried calculating E(X) and 
it’s not a value that’s in the probability 
distribution. Did I do something wrong? 

When you calculate E(X), the result 
may not be a possible value in your 
probability distribution. It may not be a value 
that can actually occur. If you get a result 
like this, it doesn’t mean that you’ve made a 
mistake, so don’t worry. 

Are there any other sorts of 
probability distribution? 

Yes, there are. Keep reading and you’ll 
find out more. 


you are here ► 


301 





Your quick guide to the binomial distribution 

Here’s a quick summary of everything you could possibly need to know about the binomial distribution 

Whew do I use it? 

Use the binomial distribution if you’re running a fixed number of independent trials, each one can have a success 
or failure, and you’re interested in the number of successes or failures 


How do I calculate probabilities? 

Use 

P(X = r) = n C r p r q n r 


n! 


r! (n - r)! 


where p is the probability of success in a trial, q = 1 - p, n is the number of trials, and X is the number of 
successes in the n trials. 


What about the expectation and variance? 

E(X) = np 


Var(X) = npq 
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In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of 
getting a successful outcome in a single trial is 0.25 


1. What’s the probability of getting exactly two questions right? 


2. What’s the probability of getting exactly three questions right? 


3. What’s the probability of getting two or three questions right? 


4. What’s the probability of getting no questions right? 


5. What are the expectation and variance? 
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BteRciSe 
SoLytiOH 


In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of 
getting a successful outcome in a single trial is 0.25 

1. What’s the probability of getting exactly two questions right? 

|-f )( o-f ^ucs*tioir\s BY\S^crtd Co^ttUy, ⑶ 〜 B ( 灼 , p) 

P(>< - Z) - 5 C Z O.Vi 1 % on^ 

—% 0.0 厶 2 •弓乂 O . 午 2J0 ? 弓 

z!z! 

-10 X. 0.01^ 

二 O.ZH 


2. What’s the probability of getting exactly three questions right? 

P(>< - - % x. O.V? % 0.10 

—x. .O.Ol^TJy O 

zlz! 

-10 o.ooon°i 

二 omi°[ 


So, you can 
expect to get 
less than 2 questions 
correct? I think now’s 
about time to quit. 
Sorry you won’t win the 
swivel chair, though. 


3. What’s the probability of getting two or three questions right? 
P(>< - Z or >< - - P(>< - Z) + P(>< - l) 

二 o.zh + o.oon°i 
二 0^\°( 


4. What’s the probability of getting no questions right? 

p(>< -o) - on 巧 

二 0.23? 



5. What are the expectation and variance? 

E()<) — \/av-()<) — y \^°[ 

二弓 x 0.2 •弓 —^ y. O.TJy % 0.7 弓 

二 I.M 二 0°P>1^ 
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^harpen your pencil 

Solution 


Here are the questions for Round Two. The questions are all about 
the game show host. 



.What was the name of his first girlfriend? 


I 


W 


Cl 



Mary 


Maggie 


W 



M 




2. What would be an ideal gift for h 




statue 


A horse 






W 



d 


: A tin dog 


: A hovercraft 



.What is his greatest achievement? 



A: Hosting a quiz show 



: Raising $1000 for the seal sanctuary 


W 


w 





B: Winning Mr Statsville 2008 


: Releasing an album 


W 

i 

i 

» 

I 






4. What is his secret ambition 



Cl 



: To launch a range of sports equipment 


: To launch his own range of menswear 



B: To release an exercise DVD 


: To have his own hair care range 


M 




5. In what year was he abducted by aliens? 




w 

1 


Ik 


W 




o 


O 



Ifs been great having you as a contestant on the 
show, and wed love to have you back later on. But 
weve just had a phone call from the Statsville 
cinema. Some problem about popcorn...? 













introducing the poisson distribution 


The Statsvillc Cinema has a problem 



Wheres my popcorn? 
I want popcorn now! 
Give me my popcorn! 


It’s a fact of life that cinemagoers like popcorn. 


The trouble is that the popcorn machine at the Statsville Cinema keeps 
breaking down, and the customers aren’t happy. 


The cinema has a big promotion on next week, and the cinema manager 
needs everything to be perfect. He doesn’t want the popcorn machine to 
break down during the week, or people won’t come back. 


The mean number of popcorn machine malfunctions per week, or rate of 
malfunctions, is 3.4. What’s the probability that it won’t break down at all 
next week? 


If they expect the machine to break down more than a few times next week, 
the Statsville Cinema will buy a new popcorn machine, but if not, they’ll 
stick with the current one and run the risk of a breakdown. 


It's a different sort of distribution 

This is a different sort of problem from the ones we’ve encountered so far. 


This time there’s no series of attempts or trials. Instead, we have a situation 
where we know the rate at which malfunctions happen, and where 
malfunctions occur at random. 


So how do we find probabilities? 

The trouble with this sort of problem is that while we know the mean 
number of popcorn machine malfunctions per week, the actual number 
of breakdowns varies each week. On the whole we can expect 3 or 4 
malfunctions per week, but in a bad week there’ll be far more, and in a good 
week there might be none at all. 

We need to find the probability that the popcorn machine won’t break down 
next week. 

Sound difficult? Don’t worry, there’s a probability distribution that’s 
designed for just this sort of situation. It’s called the Poisson distribution. 
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geometric, binomial, and poisson distributions 


The Poisson distribution covers situations where: 


TBfrlWti 初 nip Cl^se 




Individual events occur at random and independently in a given 
interval. This can be an interval of time or space — for example, 
during a week, or per mile. 



You know the mean number of occurrences in the interval or the 
rate of occurrences, and it’s finite. The mean number of occurrences 
is normally represented by the Greek letter A (lambda). 


Let’s use the variable X to represent the number of occurrences in 
the given interval, for instance the number of breakdowns in a week. If 


X follows a Poisson distribution with a mean of 入 occurrences per interval 


or rate, we write this as: 

X- Po(A) 


We’re not going to derive it here, but to find the probability that there are r 
occurrences in a specific interval, use the formula: 


P(X = r) = e 入 A r 



The formula for the probability uses the exponential function e x , where x 
is some number. It’s a standard function available on most calculators, so 
even though the formula might look daunting at first, it’s actually quite 
straightforward to use in practice. 


As an example, if X 〜 Po(2) 


P(X = 3) = e_ 2 x 2: 


3! 

e' 2 x 8 


Use Wa “st 減 e 

•w v 二冬办 d 入二上. 


tabulate m ?va 如 e. 


t \s a wa*bV\ewa 七 ’“1 
6 。上七 . I 七 al 哪 
stands U 

just m 

七 Wis v^umbev- ^ov t m tV^C 
po'»ssov\ -fovwula. M 叫 
sdWJ 比 6al6ulaW 

V^avc key v/«l' 

6al〜la 七 e yo^icrs ok t 

-fov Y ou， 


6 

=e_ 2 x 1.333 
= 0.180 


So if X follows a Poisson distribution, what’s its expectation and variance? 
It’s easier than you might think... 
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finding expectation and variance for poisson 


丁一 。 U ， 广 W 


x 


Expectation and variance for the Poisson distribution 


Finding the expectation and variance for the Poisson distribution is a lot easier 
than finding it for other distributions. 

If X 〜 Po( 入 ) ， E(X) is the number of occurrences we can expect to have in a 
given intervals, so for the popcorn machine, it’s the number of breakdowns we 
can expect to have in a typical week. In other words, E(X) is the mean number 
of occurrences in the given interval. 

Now, if X 〜 Po( 入 )， then the mean number of occurrences is given by 入 .In 

other words, E(X) is equal to 入 ， the parameter that defines our Poisson 
distribution. 

To make things even simpler, the variance of the Poisson distribution is also 
given by A, so if X 〜 Po( 入)， 


I tell you everything 
you need to know about 
the Poisson distribution. 
Expectation, variance, 
the lot. 


E(X) = A 


Var(X) = A 


In other words, if you’re given a Poisson distribution Po(A), you don’t have to 
calculate anything at all to find the expectation and variance. It’s the parameter 
of the Poisson distribution itself. 


O 



What does the Poisson distribution look like? 

The shape of the Poisson distribution varies depending on the value of 入 .If 
入 is small, then the distribution is skewed to the right, but it becomes more 
symmetrical as A gets larger. 


If 入 is an integer, then there are two modes , 入 and 入 - 1. If 入 is not an integer, 
then the mode is 入 . 


A - 

><= x)d 


308 


Chapter 7 









geometric, binomial, and poisson distributions 



BE th^ tnac]i|n^ 

Your job is to play like you’re 
tire popcorn macldne and say 
whttke probability is of you 
malfunctioning a particular 
number of times next week. 

Remember, tire mean number of 
times you break down in a week 


is 3.4. 


1. What’s the probability of the machine not malfunctioning next week? 


2. What’s the probability of the machine malfunctioning three times next week? 


3. What’s the expectation and variance of the machine malfunctions? 
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be the popcorn machine solution 



BE th^ tnac]i|n^ 知 Juffeti 

Your job is to play like you’re 
tire popcorn macldne and say 
whttke probability is of you 
malfunctioning a particular 
number of times next week. 

Remember, tire mean number of 
times you break down in a week 
is 3.4. 


Lets use )( *to \rcf\rcsc^*t ir\ur»\bc\r c^f 
popdo\nr\ mal-fu^d*tioir\s m 

d v/cck. Wlc have 

>< - Po(3. 午） 


1. What’s the probability of the machine not malfunctioning next week? 

l-f * thc\rc 3\rc y\o mal - fu ^ d * tio ^ s , )( mus*t be O. 

?ty 二 O) 二 d 


Looks like we can 
expect the machine to break 
down only 3.4 times next week, 
so we II risk it and skip that new 
machine. Don*t tell the moviegoers. 


O 



A % l A 0 


o! 




二 0.011 


2. What’s the probability of the machine malfunctioning three times next week? 


P(>< - 3 ) - ^ 


V 


A >C 




O OVi % △.拓 




3. What’s the expectation and variance of the machine malfunctions? 




VaAyO 
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How come we use 入 to represent 
the mean for the Poisson distribution? 
Why not use |j like we do elsewhere? 

We use 入 because for the Poisson 
distribution, the parameter of the distribution, 
expectation and variance are all the same. 
It’s a way of making sure we keep everything 
neutral. 

Where does the formula for the 
Poisson distribution come from? 


The key difference is that the Poisson 
distribution doesn’t involve a series of 
trials. Instead, it models the number of 
occurrences in a particular interval. 

Does A have to be an integer? 

Not at all . 入 can be any non-negative 
number. It can’t be negative as it’s the mean 
number of occurrences in an interval, and 
it doesn’t make sense to have a negative 
number of occurrences. 


It can actually be derived from the 
other distributions, but the mathematics 
are quite involved. In practice it’s best to 
just accept the formula, and remember the 
situations in which it’s useful. 

What’s the difference between 
the Poisson distribution and the other 
probability distributions? 


O 


What’s that “e” in the formula all 
about? 

e is a constant in mathematics that 
stands for the number 2.718. So you can 
substitute in 2.718 fore in the formula for 
calculating Poisson probabilities. 


Wheres my drink? I want 
a drink to go with my popcorn. 
Give me my drink now! 


The constant e is used frequently in calculus, 
and it also has many other applications 
in everything from calculating compound 
interest to advanced probability theory. 
Further discussion of e is outside the scope 
of this book, though. 

I keep getting the wrong answer 
when I try to calculate probabilities using 
the Poisson distribution. Where am I 
going wrong? 

There are two main areas where it's 
easy to trip up. The first thing is to make 
sure you’re using the right formula. It’s easy 
to get the r and the 入 mixed up, so make 
sure you’ve got them the right way round. 

The second thing is to make sure you’re 
using the e x function correctly on your 
calculator. One way of doing this is to leave 
the e _A calculation until the end. Calculate 
everything else first, then multiply by e _A . 


The Statsville Cinema has another problem. 


It’s not just the popcorn machine that keeps breaking down, now the drinks 
machine has begun malfunctioning too. The mean number of breakdowns 
per week of the drinks machine is 2.3. 


The cinema manager can’t afford for anything to go wrong next week when 
the promotion is on. What’s the probability that there will be no breakdowns 
next week, either with the popcorn machine nor the drinks machine? 

r- - 

What’s the probability distribution of the drinks 
machine? How can we find the probability that 
neither the popcorn machine nor the drinks 
machine go wrong next week? 
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x + y poisson distribution 


So whafs the probability distribution? 

Let’s take a closer look at this situation. 

We have two machines, a popcorn machine and a drinks machine, and we know 
the mean number of breakdowns of each machine in a week. We want to find 
the probability that there will be no breakdowns next week. 

Here are the distributions of the two machines: 


Popcorn macliine 



丁 I 兄 medh hurnbcV" o-f 

bveakdowhs pev* week o( - —^ 

"the pop 匕 o\rh is 3 . 午 . 


X - Po(3.4) 


Drinks macliine 


TV^C 

Wcakdo^s ^ 

七 k dv'mks matWrnc 2- 6 



Y- Po(2.3) 


If X represents the number of breakdowns of the popcorn machine and Y 
represents the number of breakdowns of the drinks machine, then both X and 
Y follow Poisson distributions. What’s more, X and Y are independent. In other 
words, the popcorn machine breaking down has no impact on the probability 
that the drinks machine will malfunction, and the drinks machine breaking down 
has no impact on the probability that the popcorn machine will malfunction. 

We need to find the probability that the total number of malfunctions next week 
is 0. In other words, we need to find 

P(X + Y = 0) 

Think back to the chapter on probabilities. If X and Y are independent variables, 
how can we find probabilities for X + Y? 
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geometric, binomial, and poisson distributions 


Combine Poisson variables 

You saw in previous chapters that if X and Y are independent random 
variables, then 

P(X + Y) = P(X) + P(Y) 

E(X + Y) = E(X) + E(Y) 

This means that if X 〜 Po(AJ and Y 〜 Po(A y ), 

X + Y - Po(A x + A y ) 


This means that if X and Y both follow Poisson distributions, then so does 
X + Y. In other words, we can use our knowledge of the way both X and Y 
are distributed to find probabilities for X + Y. 


If X is the number of times the popcorn machine malfunctions 
and Y is the number of times the drinks machine malfunctions, 
then X 〜 Po(3.4) and Y 〜 Po(2.3). 

1. What’s the distribution of X + Y? 



2. Once you’ve found how X + Y is distributed, you can use it to find probabilities. What’s P(X + Y = 0)? 
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sharpen your pencil solution 

- ^5^rpen your pencil 

Solution 


If X is the number of times the popcorn machine malfunctions 
and Y is the number of times the drinks machine malfunctions, 
then X 〜 Po(3.4) and Y 〜 Po(2.3). 


What’s the distribution of X + Y? 

+ A v - l A + Z3 

卞 7 

二 H 


2. Once you’ve found how X + Y is distributed, you can use it to find probabilities. What’s P(X + Y = 0)? 
p()< + >/ 二 O ) 二 d 




o! 


er 5 . 7 


o.ool 
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Dumb Questions 


Does that mean that the probability 
and expectation shortcuts we saw 
earlier in the book work for the Poisson 
distribution too? 

Yes they do. X and Y are independent 
random variables, because the popcorn 
machine malfunctioning does not affect 
the probability that the drinks machine will 
malfunction, and vice versa. This means that 
we can use all of the shortcuts that apply to 
independent variables. 


Why does X + Y follow a Poisson 
distribution? 

X + Y follows a Poisson distribution 
because both X and Y are independent, and 
they both follow a Poisson distribution. 

Both the popcorn machine and drinks 
machine each malfunction at random but at 
a mean rate. This means that together they 
also breakdown at random and at a mean 
rate. Together, they still meet the criteria for 
the Poisson distribution. 


So can we use the distribution of 
X +Y in the same we would any other 
Poisson distribution? 

Yes, we use it in exactly the same way, 
so once you know what the parameter 入 is, 
you can use it to find probabilities. 
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The Case of the Broken Cookies 


Kate works at the Statsville cookie factory, and her job is to make sure 
that boxes of cookies meet the factory’s strict rules on quality control. 


Kate know that the probability that a cookie is broken is 0.1, and her 
boss has asked her to find the probability that there will be 15 broken 


Five JVtlnufe 
Mystery 



cookies in a box of 100 cookies. “It’s easy，，’ he says. “Just use the 
binomial distribution where n is 100, and p is 0.1.” 

Kate picks up her calculator, but when she tries to calculate 100!. 
her calculator displays an error because the number is too big. 
“Well，” says her boss, “you’ll just have to calculate it manually. 

But I’m going home now, so have a nice night.” 

Kate stares at her calculator, wondering what to do. Then she smiles. 
Maybe I can leave early tonight, after all.” 


Within a minute, Kate’s calculated the probability. She’s managed 
to find the probability and has managed to avoid calculating 100! 
altogether. She picks up her coat and walks out the door. 


Hovo did Kate find the probability so quickly，and avoid 
the error on her calculator? 
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The Poisson in disguise 

The Poisson distribution has another use too. Under certain 
circumstances it can be used to approximate the binomial distribution. 




Sometimes it’s simpler to use the Poisson 
distribution than the binomial. 

As an example, imagine if you had to calculate a binomial 
probability where n is 3000. At some point you’d need to calculate 
3000!， which would be difficult even with a good calculator. 
Because of this, it’s useful to know when you can use the Poisson 
distribution to accurately approximate the answer instead. 

So under what circumstances can we use this, and how? 


Imagine we have a variable X where X 〜 B(n, p). We want to find 
a set of circumstances where B(n, p) is similar to Po( 入 ). 

Let’s start off by looking at the expectation and variance of the 
two distributions. We want to find the circumstances in which the 
expectation and variance of the Poisson distribution are like those 
of the Binomial distribution. In other words, we want 

Expe 似 i 0h 4 入 to be like np 
l/av-i3h^c - ^入 to be like npq 


np and npq are close to each other if q is close to 1 and n is large. 

In other words: 

X ~ B(n, p) can be approximated by X ^ Po(np) 
if n is large and p is small 
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The approximation is typically very close if n is larger than 50, and p is less 
than 0 . 1 . 
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E%eftci$e 


A student needs to take an exam, but hasn’t done any revision for it. He needs to guess the 
answer to each question, and the probability of getting a question right is 0.05. There are 50 
questions on the exam paper. What’s the probability he’ll get 5 questions right? Use the Poisson 
approximation to the binomial distribution to find out. 


tWei^re no ^ 

Dumb Questions 


Why would I ever want to use the 
Poisson distribution to approximate the 
binomial distribution? 

When n is very large, it can be difficult 
to calculate n C ■ Some calculators run out 

r 

of memory, and the results can be so large 
they're just unwieldy. Using the Poisson 
distribution in this way is a way round this 
sort of problem. 


So when can I use this 
approximation? 

You can use it when n is large (say 
over 50) and p is small (say less than 
0.1). When this is the case, the binomial 
distribution and the Poisson distribution are 
approximately the same. 


Why do we use np as the 
parameter for the Poisson distribution? 

The Poisson distribution takes one 
parameter, 入 , and E(X)= 入 . This means that 
if we have use the Poisson approximation 
of the binomial distribution, we can 
substitute in the expectation of the binomial 
distribution, np. 
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ExeRctSe 

§OLytiOH 


A student needs to take an exam, but hasn’t done any revision for it. He needs to guess the 
answer to each question, and the probability of getting a question right is 0.05. There are 50 
questions on the exam paper. What’s the probability he’ll get 5 questions right? Use the Poisson 
approximation to the binomial distribution to find out. 


Let’s use )( *to \rcp\rcsci^*t i^umbcv- o-f ^ucs*tioir\s -the s*tudci^*t *this problem, y \ — 

aY\d p — 0.0$, 吁二 2 •. 弓 .This wc tav\ use )< ^ Po(Z-to appv-o>cimaic ihc pv-obabili*ty. 

p(>< 二幻二 n 


-Z .5 . o 


t 乂 




t 


-Z .5 


弓 ! 
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Solved: The Case of the Broken Cookies 

Hoiv did Kate find the probability so quickly，and avoid 
the Out of Memory error on her calculator? 


Kate spotted that even though she needed to use the 
binomial distribution, her values of n and p were 
such that she could approximate the probability using 
the Poisson distribution instead. 


A lot of calculators can’t cope with high factorials, 
and this can sometimes make the binomial distribution 
unwieldy. Knowing how to approximate it with the Poisson distribution 
can sometimes save you quite a bit of time. 
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geometric, binomial, and poisson distributions 


Anyone for popcorn? 

You’ve covered a lot of ground in this chapter. You’ve built on your 
existing knowledge of probability and statistics by tackling three of the 
most important discrete probability distributions. Moreover, you’ve 
gained a deeper understanding of how probability distributions work 
and the sort of shortcuts you can make to save yourself time and 
produce reliable results, skills that will come in useful in the rest of the 
book. 

So sit back and enjoy the popcorn — you’ve earned it. 



Your quick guide to the Poisson distribution 

Here’s a quick summary of everything you could possibly need to know about the Poisson distribution 

Whew do I use it? 

Use the Poisson distribution if you have independent events such as malfunctions occurring in a given interval, 

and you know 入 ， the mean number of occurrences in a given interval. You’re interested in the number of 
occurrences in one particular interval. 

How do I calculate probabilities, and the expectation and variance? 

Use 

P(X = r) = e A A r E(X) = A Var(X) = A 

r! 

How do I combine independeHt random variables? 

If X 〜 Po(A ) and Y 〜 Po(A ), then 

X + Y - Po(A x + A y ) 

What connection does it have to the binomial distribution? 

If X 〜 B(n, p), where n is large and p is small, then X can be approximated using 

X ~ Po(np) 
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long exercise 


m 


t°nt E%eRqSe 


Here are some scenarios. Your job is to say which distribution each of them follows, say what 
the expectation and variance are, and find any required probabilities. 


1 ■ A man is bowling. The probability of him knocking all the pins over is 0.3. If he has 10 shots, what’s the probability he’ll 
knock all the pins over less than three times? 
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geometric, binomial, and poisson distributions 


2. On average, 1 bus stops at a certain point every 15 minutes. What’s the probability that no buses will turn up in 
a single 15 minute interval? 


3. 20% of cereal packets contain a free toy. What’s the probability you'll need to open fewer than 4 cereal packets 
before finding your first toy? 
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long exercise solution 


m 


E%etiaSe 
Sotoiiort 


Here are some scenarios. Your job is to say which distribution each of them follows, say what 
the expectation and variance are, and find any required probabilities. 


1 ■ A man is bowling. The probability of him knocking all the pins over is 0.3. If he has 10 shots, what’s the probability he’ll 
knock all the pins down less than three times? 


l-f is -the -times k^odks all fms ovc\r, )( ^ BOO, 03) 


£()<)=• 叫 > VBrty) — y\f<\ 

-10 >C 03 - 10 03 >C 0.1 

— Z 二 2J 

For a probability, P()< 二 v) 二 1 ^ % f % °( ^ 

P(>< - O) - l0 C % O.Z° % OT 

f o 

=■ I % I % O.OZG 

二 o.oze 

p(〆 二 i ) 二 ％ * oh o.r 

-10 03 % 0.0 午 0% 

二 O.lll 

?ty - z) - lo c z % oy% o.r 

-^ >c 0.0°{ >c 0.0^ 

二 0.233 

P(>< < 3) - p()( - O) + P(>< - I) + p()< - V 
- O.Oie + 0.1 Zl + 0.233 

二 o.zez 
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geometric, binomial, and poisson distributions 


2. On average, 1 bus stops at a certain point every 15 minutes. What’s the probability that no buses will turn up in 
a single 15 minute interval? 

|-f )( is <^f buses s*top'm a mtcv-val, 灼 >< - Pod) 


E()<) 二 A 二入 

- I - I 

Fov- a jc^cv-al probability, P()< 二 v) 二 e 一入入 ^ 


P(>< - O) - c - 1 1 ° 

o! 

二 c - , x I 

I 

二 av>e 

3. 20% of cereal packets contain a free toy. What’s the probability you'll need to open fewer than 4 cereal packets 
before finding your first toy? 


l-f )( is -the i^umbcv- ttrt^ padkc*ts 七 Y\ttd *to be opened m o\rdc\r *to -f md you\r -fi\rs*t *toy, 灼 ) < ~ 6\to(O JJ 


E()<) - I/p 

二 l/O.l 

二弓 

Fov- a jc^cv-al probability, P()< < \r) =• I - 

p(>< < -1 - ^ 

二丨一 o.^ 

二丨 - O . 弓 IZ 
- o .午佛 


Varty) - '/p z 

二 O.e/O.l 1 

二 o.e/o.o^ 
-10 
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bullet points 


■ lfX~Geo(p) then 

E(X) = 1/p 
Var(X) = q/p 2 


^^^BUILET POINTS 

■ The geometric distribution 

applies when you run a series of 
independent trials, there can be 
either a success or failure for each 
trial, the probability of success is 
the same for each trial, and the 
main thing you’re interested in 
is how many trials are needed in 
order to get your first success. 

■ If the conditions are met for the 
geometric distribution, X is the 
number of trials needed to get the 
first successful outcome, and p is 
the probability of success in a trial, 
then 

X ~ Geo(p) 

■ The following probabilities apply if 
X~ Geo(p): 


The binomial distribution applies 
when you run a series of finite 
independent trials, there can be 
either a success or failure for each 
trial, the probability of success is 
the same for each trial, and the 
main thing you’re interested in is 
the number of successes in the n 
independent trials. 

If the conditions are met for the 
binomial distribution, X is the 
number of successful outcomes 
out of n trials, and p is the 
probability of success in a trial, 
then 

X~B(n, p) 

If X 〜 B(n, p), you can calculate 
probabilities using 

P(X = r) = n C r p r q n _「 
where 

n C= n! 
r! (n - r)! 

IfX- B(n, p), then 

E(X) = np 

Var(X) = npq 


The Poisson distribution applies 
when individual events occur at 
random and independently in a 
given interval, you know the mean 
number of occurrences in the 
interval or the rate of occurrences 
and this is finite, and you want to 
know the number of occurrences in 
a given interval. 

If the conditions are met for 
the Poisson distribution, X is 
the number of occurrences in a 
particular interval, and A is the rate 
of occurrences, then 

X ~ Po(A) 

If X ~ Po ( 入 ) then 

P(X = r) = e_ 入入 r 

E(X) = A 
Var(X) = A 

lfX~Po(A x ),Y~Po(A y ) and X and 
Y are independent, 

X + Y~ Po ( 入 x + 入 y ) 

If X 〜 B(n, p) where n is large and 
p is small, you can approximate it 
with X ~ Po(np). 


pq 

= 

\ —/ 


r 

q 
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8 using th^ normal distribution 


參 Being Normal ♦ 





Discrete probability distributions can’t handle every situation. 

So far we’ve looked at probability distributions where we’ve been able to specify exact 
values, but this isn’t the case for every set of data. Some types of data just don’t fit the 
probability distributions we’ve encountered so far. In this chapter, well take a look at 
how continuous probability distributions work, and introduce you to one of the most 
important probability distributions in town — the normal distribution. 


this is a new chapter 
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discrete data vs. continuous data 


discrete data takes exact values... 

So far we’ve looked at probability distributions where the data is 
discrete. By this we mean the data is composed of distinct numeric 
values, and we’re been able to calculate the probability of each of 
these values. As an example, when we looked at the probability 
distribution for the winnings on a slot machine, the possible amounts 
we could win on each game were very precise. We knew exactly what 
amounts of money we could win, and we knew we’d win one of them. 


5 

4 

3 

2 


八 


1 2 3 4 5 





If data is discrete, it’s numeric and can take only exact values. It’s 
often data that can be counted in some way, such as the number of 
gumballs in a gumball machine, the number of questions answered 
correctly in a game show, or the number of breakdowns in a particular 
period. 
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using the normal distribution 


..but not all numeric data is discrete 

It’s not always possible to say what all the values should be in a set of 
data. Sometimes data covers a range, where any value within that range 
is possible. As an example, suppose you were asked to accurately measure 
pieces of string that are between 10 inches and 11 inches long. You could 
have measurements of 10 inches, 10.1 inches, 10.01 inches, and so on, as the 
length could be anything within that range. 

Numeric data like this is called continuous. It’s frequently data that is 
measured in some way rather than counted, and a lot depends on the degree 
of precision you need to measure to. 





\s a 



Q 


The type of data you have affects how you find probabilities. 

So far we’ve only looked at probability distributions that deal with discrete data. 
Using these probability distributions, we’ve been able to find the probabilities of 
exact discrete values. 

The problem is that a lot of real-world problems involve continuous data, and 
discrete probability distributions just don’t work with this sort of data. To find 
probabilities for continuous data, you need to know about continuous data and 
continuous probability distributions. 

Meanwhile，someone has a problem... 
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frequency and continuous data 


Statsvillc mch oh blihd dates 
puhd>al; 七 hey Could arrive ai ahv -ti^c 


丁 Wis is Y/V\C^ vW’C Ic3^cs. 

1 ^ 

I ^ - : - > 

20 Minutes 


Here’s a sketch of the frequency showing the amount of time Julie 
spends waiting for her date to arrive: 




ovx a — ⑼ 


ior ^cvsc\^- 


We need to find probabilities for the amount of time Julie spends waiting for 
her date. Is the amount of time discrete or continuous? Why? How do you 
think we can go about finding probabilities? 


Whafs the delay? 


Julie is a student, and her best friend keeps trying to get her fixed up on 
blind dates in the hope that she’ll find that special someone. The only 
trouble is that not many of her dates are punctual — or indeed turn up. 

Julie hates waiting alone for her date to arrive, so she’s made herself a rule: 
if her date hasn’t turned up after 20 minutes, then she leaves. 



0 O 


I have another date tonight. 

I definitely won’t wait for more than 20 
minutes, but I hate standing around, Whafs 
the probability I’ll be left waiting for more 
than 5 minutes? Can you help? 



A. 




1 
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VC 
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using the normal distribution 


Wc need a probability distribution for continuous data 

We need to find the probability that Julie will have to wait for more than 5 
minutes for her date to turn up. The trouble is, the amount of time Julie has to 
wait is continuous data, which means the probability distributions we’ve learned 
thus far don’t apply. 


When we were dealing with discrete data, we were able to produce a specific 
probability distribution. We could do this by either showing the probability of 
each value in a table, or by specifying whether it followed a defined probability 
distribution, such as the binomial or Poisson distribution. By doing this, we 
were able to specify the probability of each possible value. As an example, when 
we found the probability distribution for the winnings per game for one of Fat 
Dan’s slot machines, we knew all of the possible values for the winnings and 
could calculate the probability of each one.. 


tould y 




VC 


X 

1 

4 

9 

14 

19 

P(X = x) 

0.977 

0.008 

0.008 

0.006 

0.001 


For continuous data, it’s a different matter. We can no longer give the probability 
of each value because it’s impossible to say what each of these precise values is. 
As an example, Julie’s date might turn up after 4 minutes, 4 minutes 10 seconds, 
or 4 minutes 10.5 seconds. Counting the number of possible options would be 
impossible. Instead, we need to focus on a particular level of accuracy and the 
probability of getting a range of values. 


I get it. For discrete probability 
distributions, we look at the probability of 
getting a particular value; for continuous 
probability distributions, we look at the 
probability of getting a particular range. 


0 
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probability density functions 


Probability density functions can be used for continuous data 

We can describe the probability distribution of a continuous random 
variable using a probability density function. 


A probability density function f(x) is a function that you can use to find the 
probabilities of a continuous variable across a range of values. It tells us 
what the shape of the probability distribution is. 


Here’s a sketch of the probability density function for the amount of time 
Julie spends waiting for her date to turn up: 


八 




f(x) 


: 娜 ? ESS 贫 



Can you see how it matches the shape of the frequency? This isn’t 
just a coincidence. 


Probability is all about how likely things are to happen, and the 
frequency tells you how often values occur. The higher the relative 
frequency, the higher the probability of that value occurring. As 
the frequency for the amount of time Julie has to wait is constant 
across the 20 minute period, this means that the probability density 
function is constant too. 
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using the normal distribution 


Probability 55 area 


For continuous random variables, probabilities are given by area. To find the 
probability of getting a particular range of values, we start off by sketching the 
probability density function. The probability of getting a particular range of values 
is given by the area under the line between those values. 


As an example, we want to find the probability that Julie has to wait for between 5 
and 20 minutes for her date to turn up. We can find this probability by sketching 
the probability density function, and then working out the area under it where x is 
between 5 and 20. 

八 




0 
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to tV'C 


PV 
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area 


—i- 

20 


- > 

X 


The total area under the line must be equal to 1, as the total area represents the 
total probability. This is because for any probability distribution, the total probability 
must be equal to 1, and, therefore, the area must be too. 




八 



Let’s use this to help us find the probability that Julie will need to wait for over 5 
minutes for her date to arrive. 





The total area under the line must be 1. What’s the value of f(x)? Its a 

do 灼 s*tairrt value- 
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finding f(x) 


To calculate probability, start by finding f(x). 


Before we can find probabilities for Julie, we need to find f(x), the probability 
density function. 


So far, we know that f(x) is a constant value, and we know that the total area under 
it must be equal to 1. If you look at the sketch of f(x), the area under it forms 
a rectangle where the width of the base is 20. If we can find the height of the 
rectangle, we’ll have the value of f(x). 







We find the area of a rectangle by multiplying its width and height together. 
This means that 

1 = 20 x height 
height = 1/20 
= 0.05 


This means that f(x) must be equal to 0.05, as that ensures the total area under it 
will be 1. In other words, 


f(x) = 0.05 where x between 0 and 20 


Here’s a sketch: 
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using the normal distribution 


..thew find probability by finding the area 

The area under the probability density line between 5 and 20 is a rectangle. 

This means that calculating the area of this rectangle will give us the probability 

P(X > 5). 

P(X > 5) = (20 - 5) x 0.05 

一 0-75 Area J - base x height 

So the probability that Julie will have to wait for more than 5 minutes is 0.75. 


x is -pfxJ =. O.O^ 


0.05 



Do I have to use area to find 
probability? Can’t I just pick all the 
exact values in that range and add their 
probabilities together? Thafs what we 
did for discrete probabilities. 


O 


O 


That doesn’t work for continuous probabilities. 

For continuous probabilities, we have to find the probability by calculating the 
area under the probability density line. 

We can’t add together the probability of getting each value within the range 
as there are an infinite number of values. It would take forever. 


The only way we can find the probability for continuous probability 
distributions is to work out the area underneath the curve formed by the 
probability density function. 


Wken ctealingf witk 
continuous data，you 
calculate protatilities 
ior a rangfe ol values. 
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no dumb questions 


So there’s a function called the 
probability density function. What’s 
probability density? 

Probability density tells you how 
high probabilities are across ranges, and 
it’s described by the probability density 
function. It’s very similar to frequency density, 
which we encountered back in Chapter 1. 
Probability density uses area to tell you 
about probabilities, and frequency density 
uses area to tell you about frequencies. 

So aren’t probability density and 
probability the same thing? 

Probability density gives you a 
means of finding probability, but it’s not the 
probability itself. The probability density 
function is the line on the graph, and the 
probability is given by the area underneath it 
for a specific range of values. 

I see, so if you have a chart 
showing a probability density function, 
you find the probability by looking at area, 
instead of reading it directly off the chart. 

Exactly. For continuous data, you 
need to find probability by calculating area. 
Reading probabilities directly off a chart only 
works for discrete probabilities. 


tJiereiare no 。 

Dumb Questi9ns 


Doesn’t finding the probability 
get complicated if you have to calculate 
areas? I mean, what if the probability 
density function is a curve and not a 
straight line? 

It’s still possible to do it, but you 
need to use calculus, which is why we're 
not expecting you to do that in this book. 

The key thing is that you see where the 
probabilities come from and how to interpret 
them. 

If you’re really interested in working out 
probabilities using calculus, by all means, 
give it a go. We don’t want to hold you back. 

Q/ You’ve talked a lot about 
probability ranges. How do I find the 
probability of a precise value? 

When you’re dealing with continuous 
data, you’re really talking about acceptable 
degrees of accuracy, and you form a range 
based on these values. Let's look at an 
example: 

Suppose you wanted a piece of string that’s 
10 inches long to the nearest inch. It would 
be tempting to say that you need a piece of 
string that’s exactly 10 inches long, but that’s 
not entirely accurate. What you're really after 
is a piece of string that’s between 9.5 inches 
and 10.5 inches, as you want string that 10 
inches in length to the nearest inch. In other 
words, you want to find the probability of the 
length being in the range 9.5 inches to 10.5 
inches. 


Q/ But what if I want to find the 
probability of a precise single value? 

This may not sound intuitive at first, 
but it’s actually 0. What you’re really talking 
about is the probability that you have a 
precise value to an infinite number of 
decimal places. 

If we go back to the string length example, 
what would happen if you needed a piece 
of string exactly 10 inches long? You would 
need to have a length of string measuring 
10 inches long to the nearest atom and 
examined under a powerful microscope. 

The probability of the string being precisely 
10 inches long is virtually impossible. 

But I’m sure that degree of 
accuracy isn’t needed. Surely it would 
be enough to measure it to the nearest 
hundredth of an inch? 

Ah, but that brings us back to the 
degree of accuracy you need in order for 
the length to pass as 10 inches, rather 
than finding the probability of a value to an 
infinite degree of precision. You use your 
degree of accuracy to construct your range 
of acceptable measurements so that you can 
work out the probability. 
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using the normal distribution 



BE . ^ensify fimcfj©n 

A bunch of probability density fimetions 
have lost trad of tiieir probabilities. 

Your job is to play \]ke you’re the 

probability density function and 
worl^ out tire probability between 
tire specified ranges. Draw a sketch 
if you tiiink tiiat will help. 


1 ■ f(x) = 0.05 where 0 < x < 20 
Find P(X < 5) 


2. f(x) = 1 where 0 < x < 1 
Find P(X < 0.5) 


3. f(x) = 1 where 0 < x < 1 
Find P(X > 2) 


4. f(x) = 0.1 - 0.005x where 0 < x < 20 
Find P(X > 5) 
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be the probability density function solution 



BE p©b^lll{y &ticg©n s©lug©ti 

A bunch of probability density fimetions 
have lost trad of tiieir probabilities. 

Your job is to play \]ke you’re the 

probability density function and 
worl^ out tire probability between 
tire specified ranges. Draw a sketch 
if you tiiink tiiat will help. 


1 _ f(x) = 0.05 where 0 < x < 20 
Find P(X < 5) 



2. f(x) = 1 where 0 < x < 1 
Find P(X < 0.5) 

八 
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3. f(x) = 1 where 0 < x < 1 
Find P(X > 2) 

The uppev o^- >c -fo\r -this p\robabili*ty 
dc^si*ty -fu^d*tior\ is I, y/hidh med^s 
i^s O above -this. 

P(>< > Z) -O 
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l"o C ，吻 


w hcirc X > / 





4. f(x) = 0.1 - 0.005x where 0 < x < 20 
Find P(X > 5) 

W\\cy\ x 二弓 , (('a) — O.Ol^. This v/e have *bo *f’md 

av-ca a v-ijii-t-a^glcdl -tv-ia^glc >/i*th heijh-t O.Ol^ 
dhd width 1^- 



p(>< - (0 郎％ i^n 

二 IMWZ 
二 0.^1^ 



TKc av-ca a *br’iar^le is 
l/Z *tKc base r«ul*tiflicd 
by *tiic he • 吵七 . 
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using the normal distribution 


BULLET POINTS 


■ Discrete data is composed of 
distinct numeric values. 

■ Continuous data covers a range, 
where any value within that range is 
possible. It’s frequently data that is 
measured in some way, rather than 
counted. 

■ Continuous probability distributions 
can be described with a probability 
density function. 


■ You find the probability for a range 
of values by calculating the area 
under the probability density function 
between those values. So to find 
P(a < X < b), you need to calculate 
the area under the probability density 
function between a and b. 

■ The total area under the probability 
density function must equal 1. 


WcVg found the probability 

So far, we’ve looked at how you can use probability density functions 
to find probabilities for continuous data. We’ve found that the 
probability that Julie will have to wait for more than 5 minutes for her 
date to turn up is 0.75. 


Thafs great, at least 
now I have an idea of 
how long I*" be waiting. 
But what about my shoes? 


O 
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height probabilities 


sole 

Searching for a-«e#mate 

As well as preferring men who are punctual, Julie has preconceived ideas 
about what the love of her like should be like. 




Julie loves wearing high-heeled shoes, and the higher the heel, the 
happier she is. The only problem is that she insists that her dates 
should be taller than her when she’s wearing her most extreme set of 
heels, and she’s running out of suitable men. 

Unfortunately, the last couple of times Julie was sent on a blind date, 
the guys fell short of her expectations. She’s wondering how many 
men out there are taller than her and what the probability is that her 
dates will be tall enough for her high standards. 

So how can we work out the probability this time? 
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using the normal distribution 


Male modelling 


So far we’ve looked at very simple continuous distributions, but it’s 
unlikely these will model the heights of the men Julie might be dating. 

It’s likely we’ll have several men who are quite a bit shorter than average, 
a few really tall ones, and a lot of men somewhere in between. We can 
expect most of the men to be average height. 




ncvcll be a ^ ^ 

avc sV^ov-tcv- 

七 average- 



:二㉟二 


Given this pattern, the probability density of the height of the men is likely 
to look something like this. 
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introducing the normal distribution 


The normal distribution is an "ideal" model for contmuous data 


The normal distribution is called normal because it’s seen as an ideal. It’s 
what you’d “normally” expect to see in real life for a lot of continuous data 
such as measurements. 

The normal distribution is in the shape of a bell curve. The curve is 
symmetrical, with the highest probability density in the center of the curve. 
The probability density decreases the further away you get from the mean. 
Both the mean and median are at the center and have the highest probability 
density. 

The normal distribution is defined by two parameters, [i and a 2 , [i tells you 
where the center of the curve is, and a gives you the spread. If a continuous 
random variable X follows a normal distribution with mean ja and standard 
deviation a, this is generally written X 〜 N(jj ， a 2 ). 



a 2 


So what effect do [i and a really have on the shape of the normal distribution? 

We said that ja tells you where the center of the curve is, and a 2 indicates the 
spread of values. In practice, this means that as a 2 gets larger, the flatter and 
wider the normal curve becomes. 



TV^c lav^v d 1 bocs ， —^ 
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using the normal distribution 


If the probability density 
decreases the further you get 
from M/ when does it reach 0? 


No matter how far you go out on the graph, the 
probability density never equals 0. 

The probability density gets closer and closer to 0, but never quite 
reaches it. If you looked at the probability density curve a very long 
way from jj,, you’d find that the curve just skims above 0. 

Another way of looking at this is that events become more and more 
unlikely to occur, but there’s always a tiny chance they might. 


So how do we find normal probabilities? 



As with any other continuous probability distribution, you find 
probabilities by calculating the area under the curve of the 
distribution. The curve gives the probability density, and the 
probability is given by the area between particular ranges. If, for 
instance, you wanted to find the probability that a variable X lies 
between a and b, you’d need to find the area under the curve between 
points a and b. 



丁 he shaded av-ca gives -the 
p\robabiliiy between ihai )< is 
between a a 灼 d b. 


Sound complicated? Don’t worry, it’s easier than you might think. 

Working out the area under the normal curve would be difficult if you had to 
do it all by yourself, but fortunately you have a helping hand in the form of 
probability tables. All you need to do is work out the range of the area you 
want to find, and then look up the corresponding probability in the table. 
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finding normal probabilities 


Three steps to calculating normal probabilities 



There are a few steps you need to take in order to find normal 
probabilities. We’ll guide you through the process, but for now here’s 
a roadmap of where we’re headed. 



Grab your distribution and range 


如 rr ^ 

-—w: 七 r :: 义 “ 





上以 s 

v-call^ S00 卜 


y\o>Wj 


O Standardize it 


UP look 


㈣ ss 



O Look up the probabilities 
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using the normal distribution 


Step 1: Petcrmlwc your distribution 


The first thing we need to do is determine the distribution of the data. 

Julie has been given the mean and standard deviation of the heights of eligible 
men in Statsville. The mean is 71 inches, and the variance is 20.25 inches. This 
means that if X represents the heights of the men, X 〜 N(71, 20.25). 




71 


^ h ' s ；, s — ‘ “The viable 

f ^ ollows a disV “ 。〜釙 d 

zo 11 dhd a … ⑽ J 


This is the o( , x 

X ~ HHI, ZO.Vy). \ 



y 



X- N(71, 20.25) 




% 


a 2 = 20.25 




We also need to know which range of values will give us the right probability 
area. In this case, we need to find the probability that Julie’s blind date will be 
sufficiently tall. 



Thafs easy. Julie wants her date to be taller 
than her, so we can work out probabilities based 
on her height. 



Julie is 64 inches tall, so we’ll find the probability that her date is taller. 
Here’s a sketch: 


|i = 71 








date 。 
ta\\ 

\\Crt : 1 




64 


whevc >( 二 6 午 \uhts. 
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standardizing normal variables 


Step 1 : Standardize to NtO. 1) 

The next step is to standardize our variable X so that the mean becomes 0 
and the standard deviation 1 • This gives us a standardized normal variable 
Z where Z 〜 N(0, 1). 



Probability tables only give probabilities for N(0, 1). 

Probability tables focus on giving the probabilities for N(0, 1) distributions, as 
it would be impossible to produce probability tables for every single normal 
distribution curve. There are an infinite number of possible values for [i and 
a 2 , and as the normal curve uses these as parameters to indicate the center 
and spread of the curve, there are also an infinite number of possible normal 
distribution curves. 




Being able to use a standard normal distribution means that we can use the 
same set of probability tables for all possible values of ja and a 2 . There’s just 
one question — how do we convert out normal distribution into a standard 
form? 





How do you think we might be able to standardize our normal distribution? 
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using the normal distribution 


To standardize first move the mean... 

Let’s start off by transforming our normal distribution so that the mean 
becomes 0 rather than 71. To do this, we move the curve to the left by 71. 



Move *to Iby 71 





This gives us a new distribution of 
X- 71 〜 N(0, 20.25) 


...thew squash the width 


We also need to adjust the variance. To do this, we “squash” our distribution 
by dividing by the standard deviation. We know the variance is 20.25, so the 

standard deviation is 4.5. ‘ - Recall ihc siaM 

is "the voo*t 0 -(* 

Doing this gives us \ - 71 〜 N(0, 1) -the 

4.5 

or Z 〜 N(0, 1) where 

Z 二 X-71 


4.5 


Look familiar? This is the standard score we encountered when 
we first looked at the standard deviation in Chapter 3. In general, 
you can find the standard score for any normal variable X using 


X ,s the viable 

>wcVc tmyihg -to -Pihd 
p\robclbili-tics -Pov-. 



/The o-f )( 


Z = X- |I 


a 

X deviaii 
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finding Z 


Now find Z for the specific value you want to find probability for 


So far we’ve looked at how our probability distribution can be standardized to 
get from X 〜 N(jj ， cr 2 ) to Z 〜 N(0, 1). What we’re most interested in is actual 
probabilities. What we need to do is take the range of values we want to find 
probabilities for, and find the standard score of the limit of this range. Then 
we can look up the probability for our standard score using normal probability 
tables. 


In our situation, we want to find the probability that Julie’s date is taller than 
her. Since Julie is 64 inches tall, we need to find P(X > 64). The limit of this 
range is 64, so if we calculate the standard score z of 64, we’ll be able to use 
this to find our probability. 


N(71, 20.25) 



TWc areas avc 如 sap 

sia^dard store ^ 


l/\fe Y\ttd b> •• 


"this 



Let’s find the standard score of 64. 


z — x - \x 


o 


= 64-71 
4.5 

=-1.56 (to 2 decimal places) 



V?+aL 

£ia^dla\rdl Stove 


So -1.56 is the standard score of 64, using the mean and standard 
deviation of the men’s heights in Statsville. 

Now that we have this, we can move onto the final step, using 
tables to look up the probability. 


To -f'md -the s*tcir\dav-d sdovc 
a value ； use 

CT 
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using the normal distribution 



Is this the same standard score that we saw before? 

Yes it is. It has more uses than just the normal distribution, but 
it’s particularly useful here as it allows us to use standard normal 
probability tables. 


Is the probability for my standardized range really the 
same as for my original distribution? How does that work? 

The probabilities work out the same, but using the probability 
tables is a lot more convenient. 


When we standardize our original normal distribution, everything 
keeps the same proportion. The overall area doesn’t grow or shrink, 
and as it’s area that gives the probability, the probability stays the 
same too. 


^harpen your pencil 


N(10, 4), value 6 


It’s time to standardize. We’ll give you a distribution and value, 
and you have to tell us what the standard score is. 

2. N(6.3, 9), value 0.3 


3. N(2,4). If the standard score is 0.5, what's the value? 


4. The standard score of value 20 is 2. If the variance 
is 16, what’s the mean? 
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sharpen solution 



It’s time to standardize. We’ll give you a distribution and value, 
and you have to tell us what the standard score is. 


1. N(10,4), value 6 
z. —* 

a 

二 石 - lo 

z 

二一 z 


2. N(6.3, 9), value 0.3 

z •二 乂一 

~o~ 



二一 z 


3. N(2,4). If the standard score is 0.5, what's the value? 

This is 七 he \rcvc\rsc previous problems. IVcVc 
-the 5 七 3 灼 da\rd store, v/e have *to -fmd 

ov-i^mal value- IVc ddn do by subs-ti-tutmj *m 
values y/c k^ov/, -fmdmj >c. 

Z. — 

a 

o — >c - z 

z 

O>c Z — - z 

d +Z 


4. The standard score of value 20 is 2. If the 
variance is 16, what’s the mean? 

This is a similav- problem *to question 3. iVe have *to 
subs*ti*tu*tc \ y \ values wc ki^ov/ *to -f md |a. 

Z. — 

a 

Z - zo - ^ 

Z >c 午二 2>0 — ；a 

产二 2-0 — 公 
- 11 
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So weVe found our 
distribution, standardized it, 
and found Z. Now can we find 
the probability of my blind date 
being taller than me? 


















using the normal distribution 


Step 3 : Look up the probability in your handy table 


Now that we have a standard score, we can use probability tables to find 
our probability. Standard normal probability tables allow you to look up 
any value z, and then read off the corresponding probability P(Z < z). 


< zJ is -this 




We’ve put all 
the probability 
tables you need in 
Appendix ii of the 
book. 


Just flip to pages 658-659 for the normal 
distribution tables you need to find 
probabilities in this chapter. 


So how do you use probability tables? 


Start off by calculating z to 2 decimal places. This is the value that you 
will need to look up in the table. 


To look up the probability, you need to use the first column and the top 
row to find your value of z. The first column gives the value of z to 
1 decimal place (without rounding), and the top row gives the second 
decimal place. The probability is where the two intersect. 

As an example, if you wanted to find P(Z < —3.27) ， you’d find —3.2 
in the first column, .07 in the top row, and read off a probability of 
0.0005. 


ttcv-c S i\\t -fov- 

.ol, i\\t scdov^d dermal 

fladc -fov z- 




ttcv-cs the 
v-oy/ -fov- 
2 ■二 

v/heve x is 
some humbev*. 


z 

•00 

•01 

•02 

•03 

•04 

•05 

•06 


•08 

•09 

-3.4 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 


.0003 

.0002 


.0005 

.0005 

.0005 

.0004 

.0004 

.0004 

.0004 


.0004 

.0003 









「 0005 】 


.0005 











.0010 

.0009 

.0009 

.0009 

.0008 

.0008 

.0008 

Swr 

Vooo ^* 

^ iQ07 

-3.0 

.0013 

.0013 

.0013 

.0012 

.0012 

•0011 

.0011 

.0011 

.0010 

. 0010 、 

-2.9 

.0019 

.0018 

.0018 

.0017 

.0016 

.0016 

.0015 

.0015 

.0014 

.0014 

-2.8 

.0026 

.0025 

.0024 

.0023 

.0023 

.0022 

.0021 

.0021 

.0020 

.0019 

-2.7 

.0035 

.0034 

.0033 

.0032 

.0031 

.0030 

.0029 

.0028 

.0027 

.0026 

-2.6 

.0047 

.0045 

•0044 

.0043 

.0041 

.0040 

.0039 

.0038 

.0037 

.0036 

-2.5 

.0062 

.0060 

.0059 

.0057 

.0055 

.0054 

.0052 

.0051 

.0049 

.0048 

-2.4 

.0082 

.0080 

.0078 

.0075 

.0073 

.0071 

.0069 

.0068 

.0066 

.0064 


Tiiis is 

—^.2 - ^v\d on 

meet |*b 5>VCS 
七 lie value o-f 

P(Z < 
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using probability tables 


Julie's probability is m the table 

Let’s go back to our problem with Julie. We want to find P(Z > -1.56), so let’s 
look up -1.56 in the probability table and see what this gives us. 


^ of ihc book. 


Hcvc’s 七 V^C 6oluwv\ -fov 

.Oi), i\\t second dcdWal 

pldde ov z- 


ttcv-c s -the 
v-ow -fov* 

2 - - - 1.弓乂, 
whcv-c x is 
some humbd 


z 

•00 

•01 

•02 

•03 

•04 

•05 

C°^l 

•07 

•08 

•09 

Tiiis is y/licvc 

- 3.4 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

!oc 


.0003 

.0003 

.0002 

- 3.3 

.0005 

.0005 

.0005 

.0004 

.0004 

.0004 

.oc 

)4 

.0004 

.0004 

.0003 

- 3.2 

.0007 

.0007 

.0006 

.0006 

.0006 

.0006 

.oc 

)6 

.0005 

.0005 

.0005 

- 3.1 

.0010 

.0009 

.0009 

.0009 

.0008 

.0008 

.oc 

)8 

.0008 

.0007 

.0007 

- 3.0 

.0013 

.0013 

.0013 

.0012 

.0012 

.0011 

.0( 

1 

.0011 

.0010 

.0010 

- 2.9 

.0019 

.0018 

.0018 

.0017 

.0016 

.0016 

.0( 

5 

.0015 

.0014 

.0014 

- 2.8 

.0026 

.0025 

.0024 

.0023 

.0023 

.0022 

.0( 

n 

.0021 

.0020 

.0019 

- 2.7 

.0035 

.0034 

.0033 

.0032 

.0031 

.0030 

.oc 

!9 

.0028 

.0027 

.0026 

- 2.6 

.0047 

.0045 

.0044 

.0043 

.0041 

.0040 

.oc 

19 

.0038 

.0037 

.0036 

- 2.5 

.0062 

.0060 

.0059 

.0057 

.0055 

.0054 

.oc 

■>2 

.0051 

.0049 

.0048 

- 2.4 

.0082 

.0080 

.0078 

.0075 

.0073 

.0071 

.oc 

,9 

.0068 

.0066 

.0064 

- 2.3 

.0107 

.0104 

.0102 

.0099 

.0096 

.0094 

.0( 

>1 

.0089 

.0087 

.0084 

- 2.2 

.0139 

.0136 

.0132 

.0129 

.0125 

.0122 

.01 

9 

•0116 

.0113 

.0110 

- 2.1 

.0179 

.0174 

.0170 

.0166 

.0162 

.0158 

.01 

4 

.0150 

.0146 

.0143 

- 2.0 

.0228 

.0222 

.0217 

.0212 

.0207 

.0202 

.01 

、7 

.0192 

•0188 

.0183 

- 1.9 

•0287 

.0281 

.0274 

.0268 

.0262 

.0256 

.02 

iO 

•0244 

.0239 

.0233 

- 1.8 

.0359 

.0351 

.0344 

.0336 

.0329 

.0322 

.oc 

4 

.0307 

.0301 

.0294 

- 1.7 

.0446 

.0436 

.0427 

.0418 

•0409 

.0401 

.03 

>2 

.0384 

.0375 

.0367 


.0548 

.0537 

.0526 

.0516 

.0505 

.0495 



•0475 


一 0455 

ar\d - 0 ^> 
meet l"b 
七 lie value or 
p(7, < zJ. 




rx 1 A 

/ a O 



【0594 : 

u582 

.0571 

.0559 








'.0808 

.0793 

.0778 

.0764 

.0749 

•0735 


.0708 

.0694 

.0681 

- 1.3 

.0968 

.0951 

.0934 

.0918 

.0901 

.0885 

.0869 

.0853 

.0838 

.0823 

- 1.2 

.1151 

.1131 

.1112 

•1093 

•1075 

.1056 

.1038 

.1020 

.1003 

.0985 

- 1.1 

.1357 

.1335 

.1314 

.1292 

.1271 

.1251 

.1230 

.1210 

.1190 

.1170 


=1 -0.0594 
= 0.9406 

In other words, the probability that Julie’s date is taller than her is 0.9406. 
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So, looking up the value of —1.56 in the probability table gives us a probability of 
0.0594. In other words, P(Z < -1.56) = 0.0594. This means that 

TV -total yobaWvtY ,s I ， so 


P(Z >-1.56) = 1 -P(Z <-1.56) 


avea 

There's a 94% 
chance my date 
will be taller than 
me? I like those 
odds! 






















































using the normal distribution 


Probability tables allow you to look up the probability P(Z < z) where 
z is some value. The problem is you don’t always want to find this 
sort of probability; sometimes you want to find the probability that a 
continuous random variable is greater than z, or between two values. 
How can you use probability tables to find the probability you need? 

The big trick is to find a way of using the probability tables to get to 
what you want, usually by finding a whole area and then subtracting 
what you don’t need. 


Tables Tip Cl^se 
: t bi SS” ive 广 ' 


(X 


¥ 


z 


Finding P(Z > z) 

We can find probabilities of the form P(Z > z) using 




In other words, take the area where Z < z away from the total probability. 



Z 



z 



Finding P(a < Z < b) 

Finding this sort of probability is slightly more complicated to calculate, 
but it’s still possible. You can calculate this sort of probability using 

P(a < Z < b) = P(Z < b) - P(Z < a) k 一 ^ ' 

In other words, calculate P(Z < b), and take away the area for P(Z < a). 


f 

date JtWm a ㈣ 心” 峰 


s 


P(a 
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no dumb questions 


thereiare no ^ 

Dumb Questi9ns 


Q/ I’ve heard of the term “Gaussian.” 
What’s that? 

Another name for the normal 
distribution is the Gaussian distribution. If 
you hear someone talking about a Gaussian 
distribution, they’re talking about the same 
thing as the normal distribution. 

Are all normal probability tables the 
same? 

All normal probability tables give the 
same probabilities for your values. However, 
there’s some variation between tables as to 
what’s actually covered by them. 

Variation? What do you mean? 

Some tables and exam boards use 
different degrees of accuracy in their 
probability tables. Also, some show the 
tables in a slightly different format, but still 
give the same information. 

So what should I do if I’m taking a 
statistics exam? 

First of all, check what format of 
probability table will be available to you while 
you’re sitting the exam. Then, see if you can 
get a copy. 

Once you have a copy of the probability 
tables used by your exam board, spend time 
getting used to using them. That way you’ll 
be off to a flying start when the exam comes 
around. 


Finding the probability of a range 
looks kinda tricky. How do I do it? 

The big thing here is to think about 
how you can get the area you want using 
the probability tables. Probability tables 
generally only give probabilities in the form 
P(Z < z) where z is some value. The big trick, 
then, is to rewrite your probability only in 
these terms. 

If you’re dealing with a probability in the form 
P(a < Z < b)—that is, some sort of range— 
you’ll have two probabilities to look up, one 
for P(Z < a) and the other for P(Z < b). Once 
you have these probabilities, subtract the 
smallest from the largest. 

Q/ Do continuous distributions have 
a mode? Can you find the mode of the 
normal distribution? 

Yes. The mode of a continuous 
probability distribution is the value where 
the probability density is highest. If you draw 
the probability density, it’s the value of the 
highest point of the curve. 

If you look at the curve of the normal 
distribution, the highest point is in the middle. 
The mode of the normal distribution is p. 


What about the median? 

The median of a continuous probability 
distribution is the value a where 
P(X < a) = 0.5. In other words, it’s the value 
that area of the probability density curve in 
half. 

For the normal distribution, the median is 
also |j. The median and mode don’t get used 
much when we’re dealing with continuous 
probability distributions. Expectation and 
variance are more important. 

What’s a standard score? 

The standard score of a variable is 
what you get if you subtract its mean and 
divide by its standard deviation. It's a way 
of standardizing normal distributions so 
that they are transformed into a N(0,1) 
distribution, and that gives you a way of 
comparing them. Standard scores are 
useful when you’re dealing with the normal 
distribution because it means you can look 
up the probability of a range using standard 
normal probability tables. 

The standard score of a particular value also 
describes how many standard deviations 
away from the mean the value is, which 
gives you an idea of its relative proximity to 
the mean. 
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rpen your pencil 


It’s time to put your probability table skills to the test. See if you 
can solve the following probability problems. 


1. P(Z < 1.42) 


2. P(-0.15 <Z<0.5) 


3. P(Z>z) = 0.1423. What's z? 
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sharpen solution 



It’s time to put your probability table skills to the challenge. See 
if you can solve the following probability problems. 


1. P(Z < 1.42) 

Wt -f'md *this p\robabili*ty by lookmj up I •午 2» m 
p\robabili*ty tables. This jives us 

P(z < I .午 2J 二 G°(IVL 



2. P(-0.15<Z<0.5) 

For 0 於 ， look up < O t y) ) air\d sub*tv"3d*t P(Z" < -OYo) 

P (- O.K < z < O 石二 ?(z< 05) - P(Z < -⑽) 

二 O.^ - O . 午午 O 午 
二 O.Z^ll 



3. P(Z > z) = 0.1423. What’s z? 

This is a slijh*tly diTTC\rc^*t problem. IVlcVc 
pv-obabili-ty ； By\A Y\ttd *to -f'md -the value 

IVc know P(Z* > zJ — 0.1 午 23, v/hidh 七 

p(z < zj 二 I - o.i 

二 o.^n 

The *to do is -fmd y/hidh value z. hds 

a p\robabili*ty o-f O.G^Tl. Look'mj -this up'm *thc 
p\robabili*ty tables jives us 



so 

?(z> I ol) =1 0.1 午 23 
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using the normal distribution 



E%eftci$e 



0 


Wait a sec, if I wear my 
5-inch heels, I’m much taller. 
Won’t that affect the probability 
that my date is taller than me? 


Julie has a problem. When we calculated the probability of her date 
being taller than her, we failed to take her high heels into account. 
See if you can find the probability of Julie’s date being taller than 
her while she’s wearing shoes with 5 inch heels. 

As a reminder, Julie is 64 inches tall and X - N(71,20.25) where X 
is the height of men in Statsville. 
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exercise solution 


Julie has a problem. When we calculated the probability of her date being taller than her, we 
failed to take her high heels into account. See if you can find the probability of Julie’s date being 
taller than her while she’s wearing shoes with 5 inch heels. 

As a reminder, Julie is 64 inches tall and X ~ N(71 ， 20.25) where X is the height of men in 
Statsville. 

Julie is y/cav-'mg ^ \y\(M heels, hc\r is 巧 Wc Y\ttd *to -f md P(〆 > 巧). 

We. Y\ttd *to s*ta\rt by -f mdmg store, o-f 17^ so v/c dd^ use probability tables *bo look up 

probabilities. 

Z 二） <一产 

a 

二 W - 71 

午 5 The Vdjridh^c is 

二一 2_ j s 

二 一 O .午午 （*to 上 decimal pladcs) 

Nov/ y/c^vc -foui^d z., wc y\ccd *to -f md P(Z > zJ i.e. P(Z > - O. 午午） 

P(Z > - O. 午午）二 I - P(Z < -0.午午） 

-I - O.ZZOO 
二 OV1 

So *tiic p\robabili*ty Julies da*tc is *tale\r she’s y/c3\r*mg shoes d ^ mdh heel is O . 厶 7. 



，心::。 



So, I can wear my highest heels, 
and theres still a 67% chance 
hell be taller? Sweet! 














using the normal distribution 


The Case of the Missing Parameters 

Will at Manic Mango Games has a problem. He needs to give his 
boss the mean and standard deviation of the number of minutes 
people take to complete level one of their new game. This shouldn’t 
be difficult, but unfortunately a ferocious terrier has eaten the piece of 
paper he wrote them on. 


Five 卿寧 

JVtys-fery 



Will only has three clues to help him. 


First of all, Will knows that the number of minutes people spend 
playing level one follows a normal distribution. 

Secondly, he knows that the probability of a player playing for 
less than 5 minutes is 0.0045. 


Finally, the probability of someone taking less than 15 minutes to 
complete level one is 0.9641. 


Hovo can Will find the mean and standard deviation? 
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five minute mystery solution 


The Case of the Missing Parameters: Solved 

Hoxv can Will find the mean and standard deviation? 


Will can use probability tables and standard scores to get expressions for 


the mean and standard deviation that he can then solve. 

First of all, we know that P(X < 5) = 0.0045. From 
probability tables, P(X < where z 1 = -2.61，which 
means that the standard score of 5 is -2.61. If we put 
this into the standard score formula, we get 

-2.61 = 5 - jj, 

a 




Similarly, P(X < 15) = 0.9641, which means that the standard score of 
15 is 1.8. This gives us 

1.8 = 15 - ja 

a 


This gives us two equations we can solve to find ja and a. 

-2.61a = 5 - ja _ TK'»s is a c«\ua*t'»oir\s 

1.8a = 15 -\ 


y/e 乙 3 妁 


solve- 


If we subtract the first equation from the second, we get 

1.8a + 2.61a = 15 - - 5 + ja 

4.41a = 10 
a = 2.27 

If we then substitute this into the second equation, we get 

1.8 x 2.27 =15-|^ 

15-4.086 
=10.914 


In other words. 



\\tst avc 
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using the normal distribution 


Awd they all lived happily ever after 


Just as the odds predicted, Juliet latest blind date was a success! Julie had 
to make sure her intended soulmate was compatible with her shoes, so she 
made sure she wore her highest heels to put him to the test. What’s more, 
he was already at the venue when she arrived, so she didn’t have to wait 
around. 



^/icVc y\o*t 

y/Kc-tV^cv sKcs rc*to 
Kcv- da*tc ov- Kcv- s^ocs, ku*t 
a-t least sKcs 


The first thing he said to me 
was how much he liked my shoes. 
Were clearly made for each other. 


Put it doesn't stop there. 


Keep reading and we’ll show you more things you can do 
with the normal distribution. You’ve only just scratched the 
surface of what you can do. 


^^^BUILIT POINTS - 

■ The normal distribution forms the shape of a ■ You find normal probabilities by looking up your standard 

symmetrical bell curve. It’s defined using N(p, a 2 ). score in probability tables. Probability tables give you 

T . ,, the probability of getting this value or lower. 

■ To find normal probabilities, start by identifying the ^ j ^ ^ 

probability range you need. Then find the standard score 
for the limit of this range using 

Z = X-p where Z~N(0,1). 


a 
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9 using th^ normal distribution!! 




參 Beyond Normal ♦ 



If only all probability distributions were normal. 

Life can be so much simpler with the normal distribution. Why spend all your time 
working out individual probablities when you can look up entire ranges in one swoop, and 
still leave time for game play? In this chapter, you’ll see how to solve more complex 
problems in the blink of an eye, and you’ll also find out how to bring some of that normal 
goodness to other probability distributions. 


this is a new chapter 
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come on, ride the love train 


Love is a roller coaster 

The wedding market is big business nowadays, and Dexter has an idea for 
making that special day truly memorable. Why get married on the ground when 
you can get married on a roller coaster? 

Dexter’s convinced there’s a lot of money to be made from his innovative Love 
Train ride, if only it passes the health and safety regulations. 






I need to make sure 
the combined weight of 
the bride and groom won’t 
be above 380 pounds. 
Think you can help? 
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Before Dexter can go any further, he needs to make 
sure that his special ride can cope with the weight 
of the bride and groom, and he’s asked if you can 
help him. 

The ride he has in mind can cope with combined weights of up to 380 
pounds. What’s the probability that the combined weight will be less 
than this? 






using the normal distribution ii 


All aboard the Love Train 

Before we start, we need to know how the weights of brides and grooms in 
Statsville are distributed, taking into account the weight of all their wedding 
clothes. Both follow a normal distribution, with the bride weight distributed 
as N(150, 400) and the groom weight as N(190, 500). Their weights are 
measured in pounds. 



150 190 


We need to use these two probability distributions to somehow work out 
the probability that the weight of a bride and groom will be less than the 
maximum weight allowance on the ride. If the probability is sufficiently 
high, we can be confident the ride is feasible. 





How do you think we can find the probability 
distribution for the combined weights of the bride and 
groom? What sort of distribution do you think this 
might be? Why? 
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adding two normal distributions 


Normal bride 4 - normal groom 

Let’s start by taking a closer look at how the weights of the bride and groom 
are distriuted. 

As you know, the weights follow normal distributions like this: 


^id cs ^,11 l . 
〜唞〜 d ! 


Bride 


Groom 










150 ^ 


190 




What we’re really after, though, is the probability distribution of the 
combined weight of the bride and groom. In other words, we want to find 
the probability distribution of the weight of the bride added to the weight 
of the groom. 


Bride weight + Groom weight ^ ? 


Assuming the weights of the bride and groom are independent, the 
shape of the distribution should look something like this: 


Bride + Groom m 








a 


V^cvc- 






a \oi vav\at«o^ m 

6owWed — 吵七，七 W 咖 
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using the normal distribution ii 


If s still just weight 


Can you remember when we first looked at continuous data and looked at how data 
such as height and weight tend to be distributed? We found that data such as height 
and weight are continuous，and they also tend to follow a normal distribution. 

This time we’re looking at the combined weight of the happy couple. Even though 
it’s combined weight, it’s still just weight, and we already know how weight tends to be 
distributed. The combined weight is still continuous. What’s more, the combined 
weight is still distributed normally. In other words, the combined weight of the 
bride and groom follows a normal distribution. 

Knowing that the combined weight of the bride and groom follows a normal 
distribution helps us a lot. It means that we’ll be able to use probability tables just 
like we did before to look up probabilities, which means we’ll be able to look up the 
probability that the combined weight is less than 380 pounds — just what we need for 
the ride. 


There’s only one problem — before we can go any further, we need to know the mean 
and variance of the combined weight of the bride and groom. How can we find this? 

r 

Bride weight + Groom weight ~ N(?, ?) 


丁 dowWcd 从吵乇 
ioWoviS a v\orr^a\ 






s 

? 



rpen your pencil 


1. E(X + Y) 


It’s time for a trip down memory lane. Can you remember the 
discrete shortcuts for the following formulas? Assume X and Y are 
independent. 


2. Var(X + Y) 


3. E(X-Y) 


4. Var(X-Y) 
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sharpen solution 


^harpen your pencil 

Solution 


.E(X + Y) 


£(><) + W 


3. E(X-Y) 

B()( 一…二 Bty) - B(Y) 


It’s time for a trip down memory lane. Can you remember the 
discrete shortcuts for the following formulas? Assume X and Y are 
independent. 

2. Var(X + Y) 

Var()( + /) - Var()() + Var(Y) 


4. Var(X-Y) 




Varty + >/) - VaA)0 + VaA^/) 


补必 ^7! y Ch 



I don’t see how these 
shortcuts help us. They're 
for discrete data, and were 
dealing with continuous now. 


The shortcuts apply to continuous data too. 

When we originally encountered these shortcuts, we were dealing with discrete data. 
Fortunately, the same rules and shortcuts also apply to continuous data. 



How do you think we can use these shortcuts to find the probability 
distribution of the weight of the bride + the weight of the groom? 
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using the normal distribution ii 


How's the combined weight distributed? 


So far, we’ve found that the combined weight of the bride and groom are 
normally distributed, and this means we can use probability tables to look up 
the probability of the combined weight being less than a certain amount. 

Let’s try rewriting the bride and groom weight distributions in terms of X 
and Y. If X represents the weight of the bride and Y the weight of the groom, 
and X and Y are independent, then we want to find [i and a where 


X + Y - N(|I, a 2 ) < 




In other words, before we go any further we need to find the mean and variance of 
X + Y. But how? 

Take a look at the answers to the last exercise. When we were working with discrete 
probability distributions, we saw that as long as X and Y are independent we could 
work out E(X + Y) and Var(X + Y) by using 


E(X + Y) = E(X) + E(Y) and Var(X + Y) = Var(X) + Var(Y) 

So if we know what the expectation and variance of X and Y are, we can use these 
to work out the expectation and variance of X + Y 



We can use what we already know to figure out 
what we don’t- 

Because we know how the weight of the bride and the weight of the 
groom are distributed, we can find the distribution of the combined 
weight of the bride and groom. 

Let’s look at this in more detail. 


That means that if we 
know the distribution of X 
and y, we can figure out the 
distribution of X + Y too. 
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x + y andx-y in depth 



X + y D 树 Up Cl^se 


Being able to find the distribution of X + Y is useful if you’re working —« Rcmcmkcv, W vav'»ablcs art 

'AnVh mmKin ^ tinn q normal If inripnpnripnt r^inrlnm \r ^. i_ - \\r T.nCV no^C l 


OTk 


with combinations of normal variables. If independent random variables ' V^c ⑽ 

X and Y are normally distributed, then X + Y is normal too. What’s ov , ca^ oi^trs ? vobaW 細 . 

more, you can use the mean and variance of X and Y to calculate the 
distribution of X + Y 

To find the mean and variance of X + Y, you can use the same formulae 
that we used for discrete probability distributions. In other words, if 


X 〜 N(^i x , a x 2 ) and Y 〜 N(^i y , O y 5 


then 


X +Y- N(|I, a 2 ) 


where 


|i = M 


X 



七 V^c j 〆 十 Y 


y 


a 


2 


x 


a 


2 


y 


Wc use ihese shov-tduts i-f 
X y a\rc mdepc 灼 deivt, 
makes li-fc vevy easy mdeed 


In other words, the mean of X + Y is equal to the mean of X plus the 
mean of Y, and the variance of X + Y is equal to the variance of X plus 
the variance of Y 

Let’s look at a sketch of this. What do you notice about the variance of 
X +Y? 


X-N(|i x , a x 2 ) 


〜 N(|i y ， CT y 2 ) 


X + Y ~ N(|i x + |i y ， <r x 2 + <r y 2 ) 





M x +M 


y 


The variance of X + Y is greater than the variance of X and also 
greater than the variance of Y，which means that the curve of X + Y 
is more elongated than either. This is true for any normal X and Y. 

By adding the two variables together, you are in effect increasing the 
amount of variability, and this elongates the shape of the distribution. 
This in turn means that the shape of the distribution gets flatter so that 
the total area under the curve is still 1. 
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using the normal distribution ii 


-- y D 财 rikiti 初 nip Ckse 

Sometimes X + Y just won’t give you the sorts of probabilities you’re 
after. If you need to find probabilities involving the difference between two 
variables, you’ll need to use X - Y instead. 



X - Y follows a normal distribution if X and Y are independent random 
variables and are both normally distributed. This is exactly the same 
criteria as for X + Y. 


To find the mean and variance, we again use the same shortcuts that we 
used for discrete probability distributions. If 

X 〜 N(^ x , a 2 ) and Y 〜 N ( 〜 o y 2 ) 

then 


X ■ Y ~ N(|l, a 2 ) APP vana^cs 

wst i.kc v/c a»a 


!i = !i x - !i y a 2 = a x 2 + a y 2 



In other words, the mean of X — Y is equal to the mean of Y subtracted 
from the mean of X， and you find the variance of X — Y by adding the X 
and Y variances together. 



Adding the variances together may not make intuitive sense at first, 
but it’s exactly the same as when we worked with discrete probability 
distributions. Even though we’re subtracting Y from X, we’re actually 
still increasing the amount of variability. Adding the variances together 
reflects this. As with the X + Y distribution, this leads to a flatter, more 
elongated shape than either X or Y 


厂、 add 恤•祕 



X_Y~N(|i x -M y ， a x 2 + CT y 2 ) 





If you look at the actual shape of the X - Y distribution, it’s the same 
shape curve as for X + Y distribution, except that the center has moved. 
The two distributions have the same variances, but different means. 
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calculating probabilities forx + y 


Finding probabilities 

Now that we know how to calculate the distribution of X + Y，we 
can look at how to use it to calculate probabilities. Here are the 
steps you need to go through. 

O Work out the distribution and range 






k,ow -the disV,bu-tio, 

ahd 以 ’ ^ sia^d\^ ii 



tandardize 






❺ Look up the probabilities 


Sound familiar? These are exactly the same steps that 
we went through in the previous chapter for the normal 
distribution. 


tJiereiare no o 

Dumb Questi9ns 


Remind me, why did we need to 
find the distribution of X + Y? 

We’re looking for the probability that 
the combined weight of a bride and groom 
will be less than 380 pounds, which means 
we need to know how the combined weight 
is distributed. We’re using X to represent the 
weight of the bride, and Y to represent the 
weight of the groom, which means we need 
to use the distribution of X + Y. 

You say we can look up 
probabilities for X + Y using probability 
tables. How? 

In exactly the same way as we did 
before. We take our probability distribution, 
calculate the standard score, and then look 
this value up in probablity tables. 


Looking up probabilities for X + Y is no 
different from looking up probabilities for 
anything else. Just find the standard score, 
look it up, and that gives you your probability. 

Q/ So do all of the shortcuts we 
learned for discrete data apply to 
continuous data too? 

Yes, they do. This means we have an 
easy way of combining random variables 
and finding out how they’re distributed, 
which in turn means we can solve more 
complex problems. 

The key thing to remember is that these 
shortcuts apply as long as the random 
variables are independent. 


Can you remind me what 
independent means? 

If two variables are independent, then 
their probabilities are not affected by each 
other. In our case, we’re assuming that the 
weight of the bride is not influenced by the 
weight of the groom. 

What if X and Y aren’t independent? 
What then? 

IfX and Y aren’t independent, then 
we can’t use these shortcuts. We’d need to 
do a lot more work to find out how X + Y is 
distributed because you’d have to find out 
what the relationship is between X and Y. 
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using the normal distribution ii 



rperi your pencil 


Find the probability that the combined weight of the bride and 
groom is less than 380 pounds using the following three steps. 


1. X is the weight of the bride and Y is the weight of the groom, where X - N(150,400) and Y ~ N(190, 500). With this 
information, find the probability distribution for the combined weight of the bride and groom. 


2. Then, using this distribution, find the standard score of 380 pounds. 


3. Finally, use the standard score to find P(X + Y < 380). 
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sharpen solution 



Find the probability that the combined weight of the bride and 
groom is less than 380 pounds using the following three steps. 


1. X is the weight of the bride and Y is the weight of the groom, where X ~ N(150, 400) and Y ~ N(190, 500). With this 
information, find the probability distribution for the combined weight of the bride and groom. 

IVlc Y\ttd *to -f md -the p\robabili*ty dis*t\ribu*tio^ o-f To -f'md va\ria^^e o-f )< + y, wc ddd 

vav-ia^^cs <^f -the )( air\d dis*t\ribu*tio^s -bojctiicv-. This jives us 

>< + >/- N 桃 o , °ioo) 


2. Then, using this distribution, find the standard score of 380 pounds. 


2. — (X + - jA 

a 

二 zeo - 


? 



zo 


切 

10 


Remember V^oy/ be W used z* 二 乂一 汐 

a 

丁仏 *Ue around y/cVc tKc d\siM^ 
J )< + Y, so v/c use z* 二 （ 乂 + Y) 一产 

a 


I ZZ (io 2- dedimdl pladcs) 


3. Finally, use the standard score to find P(X + Y < 380) 

l-f y/c look \?P> up \y\ s*tanda\rdl normal pv-obabili*ty *tablcs ; y/c a p\robabili*ty O 乃 O02*. This med^s 

P(>< + y < l^o) - o.^oez 
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using the normal distribution ii 




Julie’s matchmaker is at it again. What’s the probability that a man will be at least 5 inches taller 
than a woman? 

In Statsville, the height of men in inches is distributed as N(71, 20.25), and the height of women 
in inches is distributed as N(64,16). 
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exercise solution 



Julie’s matchmaker is at it again. What’s the probability that a man will be at least 5 inches taller 
than a woman? 

In Statsville, the height of men in inches is distributed as N(71, 20.25), and the height of women 
in inches is distributed as N(64,16). 


Lets use )( *to \rcp\rcsci^*t height and *to \rcp\rcsci^*t height v/omci^. This 

>< - uni, Z 0 . 2 J 5 ) a W 斷 , w. 

iVc r\ttA *to -fmd p\robabili*ty a is d*t lcas*t ^ mshes *tallc\r *thd^ d v/oma^. This y/c *to -f'md 

p(>< > V + $) 


o\r 

p (>< - y “） 

To -f md -the va\ria^de o^- )< — y, y/c *takc c^f -f\rom -the n\tBY\ o( y(, add vav-ia^dcs 

■boythev-. This ^ives us 


IVc need *to -f md s*t^da\rd sdo\rc c^f ^ mdhes 

z. — (>c - y) - 

a 

二 n 

i>.01 

二 —033 ^*to 2 - dlcdiyv\dl 


IVc use *to -f md P()< — y > ^). 

p ()< 一 >/ > 幻二 I 一 p ()< 一 >/ < 幻 

- I - 03707 

二 
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using the normal distribution ii 


More people wawt the Love Traiw 


It looks like there’s a good chance that the combined weight of the happy 
couple will be less than the maximum the ride can take. But why restrict 
the ride to the bride and groom? 


Customers are demanding 
that we allow more members of the 
wedding party to join the ride, and 
they’ll pay good money. Thafs great, but 
will the Love Train be able to handle 
the extra load? 


Let’s see what happens if we add another car for four more members of 
the wedding party. These could be parents, bridesmaids, or anyone else 
the bride and groom want along for the ride. 

The car will hold a total weight of 800 pounds, and we’ll assume the 
weight of an adult in pounds is distributed as 

X- N(180, 625) 

where X represents the weight of an adult. But how can we work out the 
probability that the combined weight of four adults will be less than 800 
pounds? 






Think back to the shortcuts you can use when you calculate expectation 
and variance. What’s the difference between independent observations and 
linear transformations? What effect does each have on the expectation and 
variance? Which is more appropriate for this problem? 
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linear transforms vs. independent observations 


Linear transforms describe underlying changes m values 


Let’s start off by looking at the probability distribution of 4X, where X is 
the weight of one adult. Is 4X appropriate for describing the probability 
distribution for the weight of 4 people? 

The distribution of 4X is actually a linear transform of X. It’s a 
transformation of X in the form aX + b, where a is equal to 4, and b is equal 
to 0. This is exactly the same sort of transform as we encountered earlier with 
discrete probability distributions. 

Linear transforms describe underlying changes to the size of the values in the 
probability distribution. This means that 4X actually describes the weight of 
an individual adult whose weight has been multiplied by 4. 

IX 2X 




4X 



So whafs the distribution of a linear transform? 

Suppose you have a linear transform of X in the form aX + b, where 
X 〜 N(jj, a 2 ). As X is distributed normally, this means that aX + b is distributed 
normally too. But what’s the expectation and variance? 





f 




Let’s start with the expectation. When we looked at discrete probability 
distributions, we found that E(aX + b) = aE(X) + b. Now, X follows a normal 
distribution where E(X) = [i, so this gives us E(aX + b) = aja + b. 

We can take a similar approach with the variance. When we looked at discrete 
probability distributions, we found that Var(aX + b) = a 2 Var(X). We know that 
Var(X) in this case is given by Var(X) = a 2 , so this means that Var(aX + b) = a 2 a 2 . 

Putting both of these together gives us 

aX + b ^ N(a|i + b ， a 2 a 2 ) 



U w ^ 岭 a ' 


Market »s S 公 WA 從 


In other words, the new mean becomes aja + b, and the new variance becomes a 2 a 2 . 
So what about independent observations? 
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using the normal distribution ii 


••and independent observations describe how many values you have 

Rather than transforming the weight of each adult, what we really need 
to figure out is the probability distribution for the combined weight of 
four separate adults. In other words, we need to work out the probability 
distribution of four independent observations of X. 



X + X 


^ adult «s 





x + x + x 



x+x+x+x 


The weight of each adult is an observation of X, so this means that the 
weight of each adult is described by the probability distribution of X. We 
need to find the probability distribution of four independent observations of 
X, so this means we need to find the probability distribution of 


x 1 + x 2 + x 3 + x 4 

where X p X 2 , X 3 and X 4 are independent observations of X. 


© © © 






X 


4 
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finding expectation and variance 


Expectation and variance for mdependent observations 

When we looked at the expectation and variance of independent observations of 
discrete random variables, we found that 

E(X 1 + X 2 + ... XJ = nE(X) 

and 

Var(X 1 + X 2 + … + XJ = nVar(X) 

As you’d expect, these same calculations work for continuous random variables too. 

This means that if X 〜 N(j^i ， a 2 ), then 


X. + X 


1 


2 


x n ~ N(n|! ， na 2 ) 


tKereiare ng o 

Dumb Questi9ns 


So what’s the difference between 
linear transforms and independent 
observations? 

Linear transforms affect the underlying 
values in your probability distribution. As 
an example, if you have a length of rope of 
a particular length, then applying a linear 
transform affects the length of the rope. 

Independent observations have to do with 
the quantity of things you’re dealing with. 

As an example, if you have n independent 
observations of a piece of rope, then you’re 
talking about n pieces of rope. 

In general, if the quantity changes, you're 
dealing with independent observations. If the 
underlying values change, then you’re dealing 
with a transform. 


Do I really have to know which is 
which? What difference does it make? 

You have to know which is which 
because it make a difference in your 
probability calculations. You calculate 
the expectation for linear transforms and 
independent observations in the same 
way, but there's a big difference in the way 
the variance is calculated. If you have n 
independent observations then the variance 
is n times the original. If you transform your 
probability distribution as aX + b, then your 
variance becomes a 2 times the original. 

Can I have both independent 
observations and linear transforms in the 
same probability distribution? 

Yes you can. To work out the probability 
distribution, just follow the basic rules for 
calculating expectation and variance. You 
use the same rules for both discrete and 
continuous probability distributions. 


BULLET POINTS 

■ lfX~N(|J x , a 2 x ) and 

Y~N(|j y , a 2 y ), andXandY 
are independent, then 


X + 丫 ~ _ x + p y , a 2 x + a 2 y ) 


X - 丫 ~ N(p x - p y , a 2 x + a 2 y ) 

■ lfX~ N([j, a 2 ) and a and b are 
numbers, then 

aX + b~ N(a[j + b, a 2 a 2 ) 


■ lfX r X 2 , ...,X n are 

independent observations of 
X where X~ N(|j, a 2 ), then 

\ + X 2 + … + X n ~ N_ ， na 2 ) 
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using the normal distribution ii 




Let’s solve Dexter’s Love Train dilemma. What’s the probability that the combined weight of 4 
adults will be less than 800 pounds? Assume the weight of an sdult is distributed as N(180, 625). 
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exercise solution 



E^edciSe 

goLyiiOH 


Let’s solve Dexter’s Love Train dilemma. What’s the probability that the combined weight of 4 
adults will be less than 800 pounds? Assume the weight of an sdult is distributed as N(180, 625). 


l-f v/c \rcp\rcsci^*t v/cijh*t o( By\ ddul*t ds )( ^ hl(I^O, r\ttA *to s*ta\rt by -fmdmg 

hov/ 七 of 午 dJul*ts is dis*t\ribu*tcdl- To -fmd va\rid^de o-f -this y\C^i dis*t\ribu*tioir\, 

y/C multiply J )< by 午 . This jives us 

W W Nnza v^oo) 


To ^md p(>< J + w >< 午 < v/c s*ta\rt by s*ta^dav-d sdovc- 

Z. — p 

a 

二 eoo - 110 

弓 0 

=1 eo 

弓 o 


Lookiir^ -this value up m s*tdir\(j[d\rdl normal probability tables gives us d value c^f O 乃午 $2>. This n\e3r\s 

七 

+ w >< 午 < 咖） 二 o.n^z 
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We interrupt this chapter to bring you 



you are here ► 381 


Hello, and welcome back to Who Wants 
To Win A Swivel Chair, Statsvilles favorite 
quiz show. Weve got some more fiendishly 
difficult questions on tonight's show. 














Weve got some more great questions lined 
up for you today, so lefs get on with the show. In 
this round I*m going to ask you forty questions, and you 
need to get thirty or more right to get through to the 
next round. Or you can walk away and take a consolation 
prize. For each question there are four possible answers. 
The title of this round is ''Even More About Me." 

Good luck! 



rpea your pencil 


Here are the first five questions for Round Two. 
The questions are all about the game show host. 



What is his favorite fil 



Cl 



: The Day of The Jackal 


Lawrence of Arabia 


i 




B: The Italian Job 


D: All the President's Men 


W 

i 


i 

2. What is the favorite fil 

m 

of his cat 

I 





W 



i 


B: Curse of the Were-Rabbit 


: Bird on a Wire 


3. On average, how much does he spend on suits each month 


d 



$ 1,000 


C: $3,000 


!3 


B: $2,000 


: $4,000 


i 


4. How often does he have his hair cut? 


I 

I 

I 

> 

I 

i 


Once a month 


C: Three times a month 


B: Twice a month 


Four times a month 


i 


5. What is his favorite web site 


www.fatdanscasino.com 


:: www.you-cube.net 


W 



www.gregs-list.net 


www.starbuzzcoffee.com 


i 


w 
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Should we play, or walk away? 

As before, it’s unlikely you’ll know the game show host well enough to 
answer questions about him. It looks like you’ll need to give random 
answers to the questions again. 

So what’s the probability of getting 30 or more questions right out of 40? 

That will help us determine whether to keep playing, or walk away. 

ur pencil_ 

How would you find the probability of getting at least 30 out of 
40 questions correct? What steps would you need to go through 
to get the right answer? How would you find the mean and 
variance? 

We’re not asking you to find the probability ― just say how you’d 
go about finding it. 
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How would you find the probability of getting at least 30 out of 
40 questions correct? What steps would you need to go through 
to get the right answer? How would you find the mean and 
variance? 

We’re not asking you to find the probability — just say how you’d 
go about finding it. 

Thc\rc 3\rc 午 O ^ucs-tio^s, y/hidh medics 午 O *t\rials. The ou*tdomc *brial tBY\ be a su^^css o\r -failuvc, 

we *to -fmd p\robabili*ty o-f su^esses. |i^ o\rdc\r *to do *tiVis, v/c r\ttA *to use 

*tiic b'momial dis*bribu*ticm. ^Vlc use 的二今 "O, d^d ds <^ues*tio 灼 has -four possible a^sy/cv-s, p is 1/ 午 o\r O Z^.. 

|-f )( is -the i^umbcv- c^f <^ucstioir\s wc yt wc *to -f md P()< > 1>0). This medics y/c have *to ^dkula*te 

add •bojC'tiicv- -the p\robabili*tics \oy P()< — 1>0) up *to P()< — 午 O). 

IVlc -f 'md *thc va\rid)^de usmg y\, p d^d whc\rc ^ — I 一 p. The is c^ual *to d^d -the va\ria^dc is 

c«\ual *to 的 p<\. This ^ives us a me 扣 J 午 O x O.TJy — 10, a^d a varia^dc c^f % O.TJy x. OH 纟二 H 


ij^rpen your pencil 

Solution 



But doing all of those 
calculations is going to be 
horrible. Isn’t there an 
easier way? 


Using the binomial distribution can be a lot of work. 

In order to find the probability that we answer 30 or more questions 
correctly, we need to add together 11 individual probabilities. Each of 
these probabilities is tricky to find, and it would be very easy to make a 
mistake somewhere along the way. 

What we really need is an easier way of calculating binomial 
probabilities. 
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Wouldn t it be dreamy if there was a 
way of making other distributions as easy 
to work with as the normal? But I know it's 
just a fantasy... 
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Normal distribution to the rescue 

We’ve seen that life with the binomial distribution can be tough at times. Some of 
the calculations can be tricky and repetitive, which in turn means that it’s easy to 
make mistakes and spend a lot of time only to come up with the wrong answer. 

Sound hopeless? Don’t worry, there’s an easy way out. 

In certain circumstances, you can use the normal distribution to approximate the 
binomial distribution. 



O 


You're saying the normal 
distribution can approximate 
the binomial? I thought the 
Poisson did that. What gives? 


The Poisson distribution can approximate the binomial in 
some situations, but the normal can in others. 

Knowing how to approximate the binomial distribution with other distributions is 
useful because it can cut down on all sorts of complexities, and in some situations 
the Poisson distribution can help us work out some tricky binomial probabilities. 

In certain other circumstances, we can use the normal distribution to approximate 
the binomial instead. There are some huge advantages with this, as it means that 
instead of performing calculations, we can use normal probability tables to simply 
look up the probabilities we need. 

All we need to do is figure out the circumstances under which this works. 





It’s been a while since we looked at how we could use the 
Poisson distribution to approximate the binomial. Under what 
circumstances is it appropriate? 


I Q > <1 p^e <yA MOSS, .°d Acj ?c| ^ 
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BE . D 嫩 kig 卸 

Below you’ll see some binomial 
distributions for different values of n 
and p. Your job is to play like you’re 
{he distribution and say 

one you liiink can best 
he approximated by Ae normal. 
Take a good look at tire shape of 
each distribution and say A^iich 
one is most normal. 
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BE ih^ §©|ug©ti 

Below you’ll see some binomial 
distributions for different values of n 
and p. Your job is to play like you’re 
tire distribution and say 
one you tiiink can best he 
^proximated by the normal. 

T^ke a good look at tire shape of 
each distribution and say A^iich 
one is most normal. 


1 


U.C) 


jo.5 - 


0.4 - 

I 


0.3 ■ 


0.2 - 


0] 



八 


n 


0.1 


0.4 


0.3 


0.2 


0.1 




TW»s 、 s 

attuv-atc cv>ova^. 


n 


0.5 











































































Whew to approximate the binomial distribution with the normal 

Under certain circumstances, the shape of the binomial distribution looks 
very similar to the normal distribution. In these situations, we can use the 
normal distribution in place of the binomial to give a close approximation of its 
probabilities. Instead of calculating lots of individual probabilities, we can look up 
whole ranges in standard normal probability tables. 


So under what circumstances can we do this? 


We saw in the last exercise that the binomial distribution looks very similar to the y 1 y 

normal distribution where p is around 0.5, and n is around 20. As a general rule, ^ su66css, and 
you can use the normal distribution to approximate the binomial when np and nq 
are both greater than 5. 


Finding the mean and variance 

Before we can use normal probability tables to look up probabilities, we need to know what the 
mean and variance is so that we can calculate the standard score. We can take these directly from 
the binomial distribution. When we originally looked at the binomial distribution, we found that: 


ja = np 


and 


a 2 = npq 


We can use these as parameters for our normal approximation. 


np 





npq 



V?taL S+a+?stte - 

七 mg 七 lie B'momidl Distributor 

l-f )< ^ p) ahd > 5 a^d Y\<\ > 5, you C^y\ use )< ^ N *bo 

appro^inr\a*tc i*t- 



Some text 
books use a 
r criteria of 

Watek it! np > 10 and 

nq > 10. 

If you’re taking a statistics 
exam, make sure you 
check the criteria used by 
your exam board. 
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t°nt ExeticiSe 


Before we use the normal distribution for the full 40 questions for Who Wants To Win A Swivel 
Chair, let’s tackle a simpler problem to make sure it works. Let’s try finding the probability that 
we get 5 or fewer questions correct out of 12, where there are only two possible choices for 
each question. 


Let’s start off by working this out using the binomial distribution. Use the binomial distribution 
to find P(X < 6) where X - B(12, 0.5). 
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Now let’s try using the normal approximation to the binomial and check that we get the same 
result. First of all, if X ~ B(12, 0.5), what normal distribution can we use to approximate this? 
Once you’ve found that, what’s P(X < 6)? 
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t°nt ExeticiSe 


Before we use the normal distribution for the full 40 questions for Who Wants To Win A Swivel 
Chair, let’s tackle a simpler problem to make sure it works. Let’s try finding the probability that 
we get 5 or fewer questions correct out of 12, where there are only two possible choices for 
each question. 

Let’s start off by working this out using the binomial distribution. Use the binomial distribution 
to find P(X < 6) where X~B(12, 0.5). 


To -fmd 'mdividudl probabilities, y/c use -formula 


P()< 二 V ) 二 ” W- 八二 J 

V, ( 灼 - V*)’ 

IVc *to -f md P()< < ^>) y/hc\rc )( ^ 3(l2- f OTo do v/c *to -f md 

P()< — O) p()< — add all pv-obabili*tics -bo^ctiicv-. 

The 'mdividudl probabilities a\rc 

p(y - o) - lz C x. o# 1 二 o4 z 
’ 0 

P()< 二 I) 二 , 乂 O 石乂 0.^ u — IZ x. O . 弓 1 z 
P()< — Z) — ,Z C Z x. O ^y 1 x. O ^ 0 二从 >c O ^ 11 
P(>< - - ,Z C X. % 0.0 二 VIQ % 0# z 

P()< 二午 ） 二 ,Z C+ x % O.^ — x. 0^ x 
P()< — — ,Z C x. Ox. O.^ 1 — x. O . 弓 lz 


*t^csc •bojctiicv- us d 的 ovc\rall pv"obabili*ty W 


P(>< < W - 0 + IZ + ^ + ZZO + + T)l) * 

二 1^0^) x. O.^ ,z 

二 O.Z^l (bo °> dedimal places) 
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Now let’s try using the normal approximation to the binomial and check we get the same result. 
First of all, if X ~ B(12, 0.5), what normal distribution can we use to approximate this? Once 
you’ve found that, what’s P(X < 6)? 


)< ^ B(IZ, O ^)) y/hidh ir\ — IZ, p — O.^ Bv\d — O' o^ood approximation *to -this is 

)< ^ 吁 《\), or )V. 

IVic *to -f md P()< < 厶 ), so y/c s*ta\rt by ddldula*t'm^ sdo\rc- 

2. — >C — |A 

a 





-O 

Lookmj -this up *m p\robabili*ty tables ^ives us 
p(>< < 幻二 a 弓 




The two methods of calculating the probability have given 
quite different results. 

Using the binomial distribution, P(X < 6) comes to 0.387, but using the normal 
distribution it comes to 0.5. We should have been able to use the normal 
distribution in place of the binomial, but the results aren’t close enough. 

What do you think could have gone wrong? How 
do you think we could fix it? 
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Revisitmg the normal approximation 

So what went wrong? Let’s take a closer look at the problem and see if we can 
figure out what happened and also what we can do about it. 


First off, here’s the probability distribution for X 〜 B(12, 0.5). We wanted to find 
the probability of getting fewer than 6 questions correct, and we achieved this by 
calculating P(X < 6). 



We then approximated the distribution by using X 〜 N(6, 3), and as needed to find 
P(X < 6) for the binomial distribution, we calculated P(X < 6) using the normal 
distribution: 



Take a really close look at the two probability distributions. It’s tricky to spot, but 
there’s a crucial difference between the two — the ranges we used to calculate the 
two probabilities are slightly different. We actually used a slightly larger range when 
we used the normal distribution, and this accounts for the larger probability. 

We’ll look at this in more detail on the next page. 
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The binomial is discrete, but the normal is continuous 

There’s one thing we overlooked when we calculated the two probabilities — we didn’t 
make allowances for one distribution being discrete (the binomial), and the other being 
continuous (the normal). This is important, as the probability range we use can make a 
big difference to the resulting probabilities. 

Here are the probability distributions for X 〜 B(12, 0.5) and N(6, 3), both shown on the 
same chart. We’ve highlighted where the probability range we used with the normal 
distribution extends beyond the range we used for the binomial distribution. 



Gan you see where the problem lies? 

When we take integers from a discrete probability distribution and translate them onto a 
continuous scale, we don’t just look at those precise values in isolation. Instead, we look 
at the range of numbers that round to each of the values. 

Let’s take the discrete value 6 as an example. When we translate the number 6 to a 
continuous scale, we need to consider all of the numbers that round to it — in other 
words, the entire range of numbers from 5.5 to 6.5. 

valvACS to 


5 5-5 6 6.5 7 

So how does this apply to our probability problem? 

When we tried using the normal distribution to approximate the probability of getting 
fewer than 6 questions correct, we didn’t look at how the discrete value 6 translates onto a 
continuous scale. The discrete value 6 actually covers a range from 5.5 to 6.5, so instead of 
using the normal distribution to find P(X < 6), we should have tried calculating 
P(X < 5.5) instead. 

This adjustment is called a continuity correction. A continuity correction is the small 
adjustment that needs to be made when you translate discrete values onto a continuous scale. 
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Apply a cowtiwuity correction before calculating the approximation 

Let’s try finding P(X < 5.5) where X 〜 N(6, 3)，and see how good an approximation 
this is for the probability of getting five or fewer questions correct. Using the binomial 
distribution we found that the probability we’re aiming for is around 0.387. 

Let’s see how close an approximation the normal distribution gives us. 

We want to find P(X < 5.5) where X 〜 （ 6, 3)，so let’s start by calculating the standard score. 


z = x - ja 


a 

= 5.5-6 

vT 

=-0.29 (to 2 decimal places) 

We want to find the probability given by the area Z < -0.29, and looking this up in 
standard normal probability tables gives us a probability of 0.3859. In other words, 

P(X< 5.5) = 0.3859 


Look at W 

pobaW • 七 必 . 丁 ㈣ 代 
vcall7 tlosc, so \i looks 


This is really close to the probability we came up with using the binomial distribution. The 
binomial distribution gave us a probability of 0.387, so the normal distribution gives us a 
pretty close approximation. 


BULLET POINTS —— 

■ In particular circumstances you can 
use the normal distribution to 
approximate the binomial. If 

X~ B(n, p) and np > 5 and nq > 5 
then you can approximate X using 
X~ N(np, npq) 


■ If you’re approximating the binomial 
distribution with the normal 
distribution, then you need to apply a 
continuity correction to make sure 
your results are accurate. 
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- Continuity Correct! 初 s Up Cl^se 

The big trick with using the normal distribution to approximate binomial probabilities 
is to make sure you apply the right continuity correction. As you’ve seen, small 
changes in the probability range you choose can lead to significant errors in the actual 
probabilities. This might not sound like too big a deal, but using the wrong probability 
could lead to you making the wrong decisions. 

Let’s take a look at the kinds of continuity corrections you need to make for different 
types of probability problems. 



Finding < probabilities 

When you work with probabilities of the form P(X < a) : the key thing you 
need to make sure of is that you choose your range so that it includes the 
discrete value a. On a continuous scale, the discrete value a goes up to 
(a + 0.5). This means that if you’re using the normal distribution to find 
P(X < a), you actually need to calculate P(X < « + 0.5) to come up with a 
good approximation. In other words, you add an extra 0.5. 


Finding > probabilities 

If you need to find probabilities of the form P(X > b), you need to make 
absolutely sure that your range includes the discrete value b. The value b 
extends down to (b - 0.5) on a continuous scale so you need to use a range of 
P(X > b - 0.5) to make sure that you include it. In other words, you need to 
subtract an extra 0.5. 


p()( < ^ -to 

pW).' 


dT 


1 



1 2 35 

' TW 

r-Vl 0^ ^ ^ ^ of) 



7 8 4 、9 10 11 




finding "between" probabilities 

Probabilities of the form P(a < X < />) need continuity corrections to make 
sure that both a and b are included. To do this, we need to extend the range 
out by 0.5 either side. To approximate this probability using the normal 
distribution, we need to find V(a - 0.5 < X < /? 4 - 0.5). This is really just a 
combination of the two types above. 


P(l .弓 < >< < ^ 
bo Ud P(l < >< 
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tKereicire no o 

Dumb Questions 


Does it really save time to 
approximate the binomial distribution 
with the normal? 

It can save a lot of time. Calculating 
binomial probabilities can be time-consuming 
because you generally have to work out the 
probability of lots of different values. You 
have no way of simply calculating binomial 
probabilities over a range of values. 

If you approximate the binomial distribution 
with the normal distribution, then it’s a lot 
quicker. You can look probabilities up in 
standard tables and also deal with whole 
ranges at once. 

So is it really accurate? 

Yes, It’s accurate enough for most 
purposes. The key thing to remember is that 
you need to apply a continuity correction. 

If you don’t then your results will be less 
accurate. 


What about continuity corrections 
for < and >? Do I treat those the same 
way as the ones for < and >? 

There’s a difference, and it all comes 
down to which values you want to include 
and exclude. 

When you’re working out probabilities using 

< and >, you need to make sure that you 
include the value in the inequality in your 
probability range. So if, say, you need to 
work out P(X < 10), you need to make sure 
your probability includes the value 10. This 
means you need to consider P(X < 10.5). 

When you’re working out probabilities using 

< or >, you need to make sure that you 
exclude the value in the inequality from your 
probability range. This means that if you 
need to work out P(X < 10), you need to 
make sure that your probability excludes 10. 
You need to consider P(X < 9.5). 


You can approximate the binomial 
distribution with both the normal and 
Poisson distributions. Which should I 
use? 

It all depends on your circumstances. 
If X 〜 B(n, p), then you can use the normal 
distribution to approximate the binomial 
distribution if np > 5 and nq > 5. 

You can use the Poisson distribution to 
approximate the binomial distribution if 
n > 50 and p < 0.1 


Remember ，you need to apply a continuity correction wken you 
approximate tke binomial distritution witk tke normal ctistritution. 
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aa] Puzzjc 


Your job is to take snippets from the 
pool and place them into the blank 
lines so that you get the right 
continuity correction for each 
dscrete probability range. You 
may use the same snippet more 
than once, and you won’t need to 
use all the snippets. 


X<3 


X = 0 



X>3 


3<X< 10 


X<3 


3 <X< 10 


X>3 — 


X>0 


3 <X< 10 


3 <X< 10 



Note: each thing from 
the pool can be used 
more than once! 


10.5 


10.5 
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aa] Puzzjc 


Your job is to take snippets from the 
pool and place them into the blank 
lines so that you get the right 
continuity correction for each 
dscrete probability range. You 
may use the same snippet more 
than once, and you won’t need to 
use all the snippets. 


Wcv-c ； wcVc lookih^ 
*Po\r values less ihah 
\rouhds -fco Z, 
SO we Ohly waht -to 
•mdude values less 
tKah Z.^ ih 
ou\r 



卜 ， wcVc Iookih 0 

^ values less 
ihh o\r c^ual io 

All the hur^bev-s 

betweeh Z.5 Sy\d 

孓弓 "to 3, SO 

wc io ih^lude 
values less ihay) 

3 •弓 ih ou<r ir^lhgc. 



X<3 

X 

< 2.5 

X>3 

X 

> 3.5 

X<3 

X < 

3.5 

X>3 — 

X 

>2.5 

3 <X< 10 - 


2.5 < X < 9.5 


X = 0 



All the humbev-s -fv-om -O.^ 

"to 弓 v-ouhd -to Of so -they 

be mduded m the 


vf 




- 0.5 < X < 0.5 


3 <X< 10 -^ 


2.5 < X < 10.5 


3 <X< 10 


3.5 < X 


< 10.5 


X>0 


X > 0.5 


3 <X< 10 


3.5 < X < 9.5 


Note: each thing from 
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What’s the probability of you winning the jackpot on today’s edition of Who Wants to Win a 
Swivel Chair? See if you can find the probability of getting at least 30 questions correct out of 40, 
where each question has a choice of 4 possible answers. 
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What’s the probability of you winning the jackpot on today’s edition of Who Wants to Win a 
Swivel Chair? See if you can find the probability of getting at least 30 questions correct out of 40, 
where each question has a choice of 4 possible answers. 


l-f is -the ir^umbcv- o( «^ucs*tioir\s y/c v/c v/a^*t *to -f md P()< > ZO) y/hc\rc )( ^ B ( 午 O, O Z^). 

As y\^ y\<\ a\rc both J\rca*tc\r ^ its app\rof\ria*tc -fo\r us *to use ^ov-mal dis*tvibu*tio^ *to 乂 

■this p\robabili*ty . 吁二 10 and 吁 ' 二 30, y/hidh medics y/c i^ccd *to -fihd P(〆 > 2^ . 弓 ) y/hc\rc )( ^ hl(IO, ?>0). 


Lets s*ta\rt by -fmdmj *thc s*td^(j[d\rdl store- 

Z. • >C — 

a 

二冰 - 10 
SO 


-1^ 

SO 

-a 沾 


Lookmj up O . 沾 ’m p\robabili*ty tables jives us d pv-obabili*ty o-f O 刀午 2>2*. This mc 3 r\s 七 
P(><> V\^) - I - 0.7 午 
二 O W 
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So, looks like 
you've only got about a 
26% chance at that swivel chair. 
If you lose, you II miss out on our 
great consolation prize. Why don’t 
you take the prize and run? 












Sorry to see you go. Ifs been great 
having you back as a contestant on the show, 
but we've just had an urgent email from 
someone called Dexter... 


O 0 


^Jharpen your pencil 

Solution 



Here are the first five questions for Round Two. 
The questions are all about the game show host. 



What is his favorite fil 




The Day of The Jackal 


Lawrence of Arabia 




W 


w 



: The Italian Job 


D: All the President's Men 


I 


2. What is the favorite film of his cat 



A: A Fish Called Wanda 


Mousehunt 







B: Curse of the Were-Rabbit 


: Bird on a Wire 


M 




3. On average, how much does he spend on suits each month 


Cl 


: $ 1,000 



C: $3,000 


I ^ 


B: $2,000 


: $4,000 


M 




4. How often does he have his hair cut? 


Cl 


: Once a month 



: Three times a month 




W 


w 



B: Twice a month 


d 


: Four times a month 



.What is his favorite web site 




: www.fatdanscasino.com 



: www.you-cube.net 


W 


w 



www.gregs-list.net 


www.starbuzzcoffee.com 


W 

I 

I 

i 

> 

i 

I 


■ 


w 


w 




w 


w 
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interview with the normal distribution 



The Nomai 

This week’s interview: 

Why Being Normal Isn’t Dull 


Expose 』 


Head First ： Hey, Normal, glad you could make it 
on the show. 

Normal ： Thanks for inviting me, Head First. 

Head First ： Now, my first question is about your 
name. Why are you called Normal? 

Normal ： It’s really because I’m so representative 
of a lot of types of data. They have a probability 
distribution that has a distinctive shape and a 
smooth, bell-curved shape, and that’s me. I’m 
something of an ideal. 

Head First ： Gan you give me an example? 

Normal ： Sure. Imagine you have a baker’s shop that 
sells loaves of bread. Now, each loaf of a particular 
sort of bread should theoretically weigh about the 
same, but in practice, the actual weight of each loaf 
of bread will vary. 

Head First ： But surely they’ll all weigh about the 
same? 

Normal ： More or less, but with variation. I model 
that variation. 

Head First ： So why’s that so important? 

Normal ： Well, it means that you can use me to work 
out probabilities. Say you want to find the probability 
of a randomly chosen loaf of bread being below a 
particular weight. That sounds like something that 
could be quite difficult, but with me, it’s easy. 

Head First ： Easy? How do you mean? 

Normal ： With a lot of the other probability 
distributions, there can be lots of complicated 
calculations involved. With Binomial you have 
factorials, and with Poisson you have to work with 
exponentials. With me there’s none of that. Just look 
me up in a table and away you go. 


Head First: Surely it’s not quite as simple as that? 

Normal ： Well, you do have to convert me to a 
standard score first, but that’s nothing, not in the 
grand scheme of things. 

Head First: So tell me, do you think you’re better 
than the other probability distributions? 

Normal ： I wouldn’t say that I’m better as such, 
but I’m a lot more flexible, and I’m useful in lots 
of situations. I’m also a lot more robust. When 
the numbers get high for Poisson and Binomial 
distributions, they run into trouble. Mind you, I do 
what I can to help out. 

Head First: You do? How? 

Normal ： Well under certain circumstances both 
Binomial and Poisson look like me. It’s uncanny; 
they’re often stopped at parties by people asking 
them if they’re Normal. I tell them to take it as a 
compliment. 

Head First ： So how does that help? 

Normal ： Well, because they look like me, it means 
that you can actually use my probability tables to 
work out their probabilities. How cool is that? No 
more late nights slaving over a calculator; just look it 
up. 

Head First ： I’m afraid that’s all we’ve got time for 
tonight. Normal, thanks for coming along, it’s been a 
pleasure. 

Normal ： You’re welcome, Head First. 
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using the normal distribution ii 


All aboard the Love Tram 


Remember Dexter’s Love Train? He’s started running trials of the ride, and everyone 
who’s given it a trial run thinks it’s great. There’s just one problem: sometimes the ride 
breaks down and causes delays, and delays cost money. 

Dexter’s found some statistics on the Internet about the model of roller coaster he’s been 
trying out, and according to one site, you can expect the ride to break down 40 times a 


year. 



40 times a year?! If the ride 
breaks down on someone s 
wedding day, they II sue! 


Given the huge profit the Love Train is bound to make, Dexter thinks that 
it’s still worth going ahead with the ride if there’s a high probability of it 
breaking down less than 52 times a year. 

So how do we work out that probability? 


What sort of probability distribution does this follow? How would 
you work out the probability of the ride breaking down less than 
52 times in a year? 
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sharpen solution 




your pencil 
Solution 


What sort of probability distribution does this follow? How would 
you work out the probability of the ride breaking down less than 
52 times in a year? 


Situations y/hc\rc youVc W\{\\ b\redk'm^ a*b d \ra*tc -follow d Poisson dis*t\ribu*tio^ 

a pa\ramc*tc\r <Jc l-f )< \rcp\rcsc^*ts the ^umbcv* <^f b\reakdovms .m a ycav-, *thc^ )( ^ Po ( 今 *0). 

IVlc 灼 ccd *to -f md P()< < To -fmd 七 his, y/e’d Y\ccd *to -f'md eddh 'mdividudl p\robabili*ty -fov all values o-f )( 
up *to . 



Q 


0 


V "' 


Working out that probability 
is gonna be tricky and time- 
consuming. I wonder if we can take 
a shortcut like we did with the 
binomial. 



Under certain circumstances, the shape of the 
Poisson distribution resembles that of the normal. 

The advantage of this is we can use standard normal probability tables 
to work out whole ranges of probabilities. This means that we don’t 
have to calculate lots of individual probabilities in order to find what 
we need. 

Approximating the Poisson distribution with the normal is very similar 
to when you use the normal in place of the binomial. Once you have 
the right set of circumstances, you take the Poisson mean and variance, 
and use them as parameters in a normal distribution. 

If X 〜 Po( 入 )， this means that the corresponding normal 
approximation is X 〜 N(X, X). But when is this true? 

It all comes down to the shape of the distribution. 
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using the normal distribution ii 


Whew to approximate the binomial distribution with the normal 

We can use the normal distribution to approximate the Poisson whenever the 
Poisson distribution adopts a shape that’s like the normal, but when does this 
happen? Let’s take a look. 


Whew X is small... 

When X is small, the shape of the Poisson distribution is different from 
that of the normal distribution. The shape isn’t symmetrical, and the 
curve looks as though it’s “pulled” over to the right. 

As the Poisson distribution doesn’t resemble the normal for small values 
of 入 ， the normal distribution isn’t a suitable approximation for the 
poisson distribution where 入 is small. 

Whew X is large... 

As 入 gets larger, the shape of the Poisson distribution looks increasingly 
like that of the normal distribution. The main part of the shape is 
reasonably symmetrical, and it forms a smooth curve that’s just like the 
one for the normal. 

This means that as 入 gets larger, the normal distribution can be used to 
give a better and better approximation of it. 






，:心 

b aW 一 



So how large is large enough? 

We’ve seen that the Poisson resembles the normal distribution when X is large, but how 
big does it have to be before we can use the normal? 

\ actually gets sufficiently large when \ is greater than 15. This means that if X 〜 Po(X) 
and \ > 15, we can approximate this using X 〜 N ( 入，入 ). 



V?+aL Statists - 

七 lie Poisson Pis 七 ributoh 


l-f )( ^ Po ( 入） X > l 1 ^) you use )< ^ /V (X) X) {jo 
app\ro%inr\a*tc i*t* 
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approximation exercise 



The number of breakdowns on Dexter’s Love Train follows a Poisson distribution where 入 = 40. 
What’s the probability that there will be fewer than 52 breakdowns in the first year? 

T 

\ ttiht Use a 的 o\r 你 3 l 

\rcrwcmbc\r youir C.Ohtmui*ty 
dovvcdtiohs. 
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using the normal distribution ii 



E%eftci$e 


It’s time to test your statistical knowledge. Complete the table below, saying what normal 
distribution suits each situation, and what conditions there are. 


Situation 

Distribution 

Condition 

X + Y 

X- N(|i x , a 2 x ), Y - (|i y , a 2 y ) 

X + 丫 〜 N(|j x + p y , a 2 x + a 2 y ) 

X, 丫 are independent 

X-Y 

X ~ N(|i x , a\), Y - (|i y , a 2 y ) 



aX + b 

X-N(|i, a 2 ) 



X, + x 2 + ■■■ + x n 

X-N(|i, a 2 ) 



Normal approximation of X 

X ~ B(n, p) 



Normal approximation of X 

X - Po(A) 
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exercise solution 



The number of breakdowns on Dexter’s Love Train follows a Poisson distribution where 入 = 40. 
What’s the probability that there will be fewer than 52 breakdowns in the first year? 


l-f v/c use )( *to \rcp\rcsci^*t i^umbcv- o-f b\redkdoy/^s'm d yca\r, )( ^ Po ( 今 *0) 

入 is la\r^c, v/C 乙扣 use 的 ovrnal dis*tvibu*tioir\ *to app\ro>cima*tc *tlVis. | 的 o*t^C\r y/o\rds, y/c use 

>< "N (午 O, ^O) 


Wt r\ttA *to -f'md probability 七 *t^C\rc a\rc -fcv/cv- 弓 2> b\rcakdov/ir\s. /\s y/cVc appv-ovcimatmg a disd\rc*tc 

p\robabili*ty dis*t\ribu*tioir\ B 匕 cm*tmuous oy\t, v/C have *to apply 3 dcm*tmui*ty do\r\rcd*ticm. iVc do^*t y/3^*t *to 
mdlude so y/c meed *to -f'md P()< < ^1.^). 


Bc-fo\rc v/c ddir\ -fmd *thc p\robabili*ty usrng s*td^dld\rdl ^ov-mal tables, y/e r\ttA *to ddldula*te *thc standard sdov-c- 

Z. —• — j/ 

a 

二弓 I. 弓 - 午 o 

二 I.^Z (*to Z decimal pladcs) 

Lookm^ -this up *m pv-obabili*ty -tables gives us 0 勹 ^>. This med^s -that 七 he p\robabili*ty o( *thc\rc bemj -fcv/cv- 
*tha^ bveakdovms’m a year is O.^^. 
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using the normal distribution ii 




E^eRciSe 
§oLjitioH 


It’s time to test your statistical knowledge. Complete the table below, saying what normal 
distribution suits each situation, and what conditions there are. 


Situation 

Distribution 

Condition 

X + Y 

X- N(|i x , a 2 x ), Y - (|i y , a 2 y ) 

X + 丫〜 N(p x + [j y , a 2 x + a 2 y ) 

X, 丫 are independent 

X-Y 

X- N(|j x , a 2 x ), Y - (|i y , a 2 y ) 

>< -V ^ - /a , a z + a z ) 

)(,y a\rc mdcpe 灼 dcirrt 

aX + b 

X-N(|i, a 2 ) 

a)< + b 10 + b, a z a z ) 

3, b a\rc dons-ta^-t values 

X, + X 2 + … + x n 

X-N(|i, a 2 ) 

><+>< + … - hlUu, ^a z ) 

1 Z h # 

)< J； )< z ，. )< a\rc mdepc 灼 deirrt 

ODSCV-Va*tioir\S <^f )< 

Normal approximation of X 

X-B(n, p) 

>< ^ N (呷,呷 《\) 

Y\^ > 吧 > 弓 

G。 灼 do\r\rcd*tioir\ v-c^uivcd 

Normal approximation of X 

X- Po(A) 

>< m) 

X> 

Coir\*tmui*ty do\r\rcd*tioir\ v-c^uivcd 
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bullet points and no dumb questions 


^^^BULIET POINTS —— 

■ In particular circumstances you 
can use the normal distribution to 
approximate the Poisson. 

■ If X 〜 Po ( 入 ) and 入 > 15 then you can 
approximate X using X 〜 N ( 入，入 ) 


■ If you’re approximating the Poisson 
distribution with the normal 
distribution, then you need to apply 
a continuity correction to make sure 
your results are accurate. 


You can approximate the binomial 
and Poisson distributions with the 
normal, but what about the geometric 
disribution? Can the normal distribution 
ever approximate that? 

We were able to use the normal 
distribution in place of the binomial and 
Poisson distributions because under 
particular circumstances, these distributions 
adopt the same shape as the normal. 

The geometric distribution, on the other hand, 
never looks like the normal, so the normal 
can never effectively approximate it. 



Do I have to use a continuity 
correction if I approximate the Poisson 
distribution with the normal distribution? 

Yes. This is because you’re 
approximating a discrete probability 
distribution with a continuous one. This 
means that you need to apply a continuity 
correction, just as you would for the binomial 
distribution. 


What’s the advantage of 
approximating the binomial or poisson 
distribution with the normal? Won’t my 
results be more accurate if I just use the 
original distribution? 

Your results will be more accurate if 
you use the original distribution, but using 
them can be time consuming. If you wanted 
to find the probability of a range of values 
using the binomial or poisson distribution, 
you’d need to find the probability of every 
single value within that range. Using the 
normal distribution, on the other hand, you 
can look up probabilities for whole ranges, 
and so they’re a lot easier to find. 


Use a continuity correction ii you approximate tke Poisson distribution 
witk tke normal distribution. 
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using the normal distribution ii 


A runaway success! 

Thanks to your savvy statistical analysis,the Love Train is open for business, and 
demand has outstripped Dexter’s highest expectations. Here are some of Dexter’s 
happy customers: 
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10 using stcitistical sampling 



Statistics deal with data, but where does it come from? 

Some of the time, data’s easy to collect, such as the ages of people attending a health 
club or the sales figures for a games company. But what about the times when data isn’t 
so easy to collect? Sometimes the number of things we want to collect data about are so 
huge that it’s difficult to know where to start. In this chapter, we’ll take a look at how you 
can effectively gather data in the real world, in a way that’s efficient, accurate, and can 
also save you time and money to boot. Welcome to the world of sampling. 


this is a new chapter 
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mighty gumball’s flavor dilemma 


The Mighty ftumball taste test 

Mighty Gumball is the leading vendor of a wide variety of candies and chocolates. Their 
signature product is their super-long-lasting gumball. It comes in all sorts of colors to suit 
all tastes. 

Mighty Gumball plans to run a series of television commercials to attract even more 
customers, and as part of this, they want to advertise just how long the flavor of their 
gumballs lasts for. The problem is, how do they get the data? 

They’ve decided to implement a taste test, and they’ve hired a bunch of tasters to help 
with the tests. There are just two problems: the tasters are using up all of the gumballs, 
and their dental plans are costing the company a fortune. 



Well, gumball #1/466 ran 
out of flavor after 55 minutes, 
but gumball #1/467 is still going 
strong after an hour... 


Please, no more gumballs! 
rm running out of teeth 



using statistical sampling 


They're running out of gumballs 

The fatal flaw with the Mighty Gumball taste test is that the tasters are trying out 
all of the gumballs. Not only is this having a bad effect on the tasters’ teeth, it 
also means that there are no gumballs left to sell. After all, they can hardly reuse 
their gumballs once the tasters have finished with them. 

The whole point of the taste test is for Mighty Gumball to figure out how long 
the flavor lasts for. But does this really mean that the tasters have to try out every 
single gumball? 





What would you do to establish how long the gumball 
flavor lasts for? What do you need to consider? Write 
your answer below in as much detail as possible. 






populations vs. samples 


Testa gumball sample, wot the whole gumball population 


Mighty Gumball is running into problems because they’re tasting every single gumball 
as part of their taste test. It’s costing them time, money, and teeth, and they have no 
gumballs left to actually sell to their customers. 


So what should Mighty Gumball do differently? Let’s start by looking at the difference 
between populations and samples. 




Gumball populations 

At the moment, Mighty Gumball is carrying out their taste test using every single 
gumball that they have available. In statistical terms, they are conducting their test 
using an entire population. 


A statistical population refers to the entire group of things that you’re trying to 
measure, study, or analyze. It can refer to anything from humans to scores to gumballs. 
The key thing is that a population refers to all of them. 


A census is a study or survey involving the entire population, so in the case of Mighty 
Gumball, they’re conducting a census of their gumball population by tasting every 
single one of them. A census can provide you with accurate information about your 
population, but it’s not always practical. When populations are large or infinite, it’s 
just not possible to include every member. 



Gumball samples 


You don’t have to taste every gumball to get an idea of how long the 
flavor lasts for. Instead of testing the entire population, you can test a 
sample instead. 

A statistical sample is a selection of items taken from a population. You 
choose your sample so that it’s fairly representative of the population as a 
whole; it’s a representative subset of the population. For Mighty Gumball, 
a sample of gumballs means just a small selection of gumballs rather 
than every single one of them. 


A study or survey involving just a sample of the population is called a 
sample survey. A lot of the time, conducting a survey is more practical 
than a census. It’s usually less time-consuming and expensive, as you don’t 
have to deal with the entire population. And because you don’t use the 
whole population, taking a sample survey of the gumballs means that 
there’ll be plenty left over when you’re done. 


So how can you use samples to find out about a population? Let’s see. 



so 

Wst some ok 
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using statistical sampling 


Here’s the chart for the population. Can you 
see how closely the sample and population 
distributions agree? 

If you compare the two charts, the overall shape 
is very similar, even though one is for all of the 
gumballs and the other is for just some of them. 
They share key characteristics such as where the 
center of the data is, and this means you can use 
the sample data to make predictions about the 
population. 


Population Chart 



oJf tV>c saw—. 


duration 


How sampling works 


The key to creating a good sample is to choose one that is as close a match to your 
population as possible. If your sample is representative, this means it has similar 
characteristics to the population. And this, in turn, means that you can use your 
sample to predict what characteristics the population will have. 

Suppose you use a representative sample of gumballs to test how long the flavor of 
each gumball lasts for. The distribution of the results might look something like this: 


Even though you’ve only tried a small sample of 
gumballs, you still have an impression of the shape 
of the distribution, and the more gumballs you try, 
the clearer the shape is. As an example, you can 
get a rough impression of where the center of the 
population distribution is by looking at the shape of 
the sample distribution. 

Let’s compare this with the actual population: 
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not all samples are reliable 


水 Sample Gone Wrong 
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Sample 
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Representative Sample 
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Whew sampling goes wrong 

If only we could guarantee that every sample was a close match to the 
population it comes from. Unfortunately，not every sample closely resembles its 
population. This may not sound like a big deal, but using a misleading sample 
can actually lead you to draw the wrong conclusions about your population. 

As an example, imagine if you took a sample of gumballs to find out how long 
flavor typically lasts for, but your sample only contained red gumballs. Your 
sample might be representative of red gumballs, but not so representative of 
all gumballs in the population. If you used the results of this sample to gather 
information about the general gumball population, you could end up with a 
misleading impression about what gumballs are generally like. 

Using the wrong sample could lead you to draw wrong conclusions about 
population parameters, such as the mean or standard deviation. You might be left 
with a completely different view of your data, and this could lead you to make 
the wrong decisions. 

The trouble is, you might not know this at the time. You might think your 
population is one thing when in fact it’s not. We need to make sure we have some 
mechanism for making sure our samples are a reliable representation of the 
population. 



This sample... 


no 七 be mos*t 

adduv-a*tc vcpv-Cscr\*tatior\ o( tiVis 
population. 



We want this: 


Instead of this: 
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using statistical sampling 


The Case of the Lost Coffee Sales 

The Starbuzz GEO has an idea for a brand-new coffee he wants to 
sell in his coffee shops, but he’s not sure how popular it’s going to be 
with his customers. He asks his new intern to conduct a survey to help 
predict the customers’ opinions. The intern will ask customers to taste 
the new brew, and tell him what they think. 


Five 卿 • 

Hymy 



The intern is really happy to be given such a great opportunity. 
First off, he’s been told that if he does the job well, he stands to 
get a bonus at the end of the month. Secondly, he gets to give 
out free coffee to friendly Starbuzz customers and hear lots of 
positive things. Thirdly, he’s been looking for an excuse to talk to 
one particular girl who’s a regular visitor to his local coffee shop, 
and this could be just the break he needs. 


After the intern conducts his survey, he’s delighted to tell the CEO 
that everyone loves the new coffee, and it’s bound to be a huge success. 
“"Thafs great, 5? says the GEO. “Well launch it next season. 

When the new coffee is finally launched, sales are poor, and the CEO 
has to cancel the range. What do you think went wrong? 

Why didn’t the nezv coffee sell voell? 
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designing a sample 


How to design a sample 

You use samples to make inferences about the population in general, and to 
make sure you get accurate results, you need to choose your sample wisely. Let’s 
start off by pinning down what your population really is, so you can get as 
representative a sample as possible. 

Pefinc your target population 

The first thing to be clear about is what your target population is so that you know 
where you’re collecting your sample from. By target population, we mean the group 
that you’re reseraching and want to collect results for. The target population you 
choose depends, to a large extent, on the purpose of your study. For example, do you 
want to gather data about all the gumballs in the world, one particular brand, or one 
particular type? 

Try to be as precise as possible, as that way it’s easier to make your sample as 
representative of your population as possible. 



\Mt y^ttd data about 

—ball 、 

aumballs, SO 70UV 竹吐 

“1 oUk ^ balls * 


Pefine your sampling units 

Once you’ve defined your target population, you need to decide what sort of object 
you’re going to sample. Normally these will be the sorts of things you described when 
you defined your target population. As an example, this could be a single gumball or 
maybe a packet of gumballs. 



TV^c m -tKc taste 

k a s'm<\lc qumball 
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Gumball #1897652 


Gumball #1897653 


statistical sampling 


Pefiwc your sampling frame 


Gumball #1897654 


Gumball #1897655 


Finally, you need a list of all the sampling units within 
your target population, preferably with each sampling 
unit either named or numbered. This is called the 
sampling frame. It’s basically a list from which you can 
choose your sample. 


Gumball #1897656 


Gumball #1897657 


Gumball #1897658 


Sometimes it’s not possible to come up with a list that covers 
the entire target population. As an example, if you want to 
collect the views of people living within a certain area, people 
moving in or out of an area can affect who you have on your 
list of names. If you’re dealing with similar objects such as 
gumballs, it might not be possible or practical to name or 
number each one. 


Gumball #1897659 


Gumball #1897660 


Gumball #1897661 


Gumball #1897662 


This seems like a 
waste of time. Do I have to 
do all of these things? CaiVt 


O 


I just sample gumballs? 







Gumball #1897663 


Gumball #1897670 


Gumball #1897671 


Gumball #1897672 


Gumball #18976 


Gumball #18S 


Gumball #1 


If you don’t design your sample well, 
your sample may not be accurate. 

Designing your sample can take a bit of extra 
preparation time, but this is much better than 
spending time and money on a survey only to find 
the results are inaccurate. You will have lost time 
and money doing the survey, and what’s more, 
someone might make wrong decisions based on it. 


A poorly designed sample can introduce bias. 
Let’s look at this in more detail. 
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bias in sampling 


Unbiased Sample 
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Unbiased samples 


An unbiased sample is representative of 
the target population. This means that it has 
similar characteristics to the population, and 
we can use these to make inferences about 
the population itself. 

The shape of the distribution of an unbiased 
sample is similar to the shape of the 
population it comes from. If we know the 
shape of the sample distribution, we can 
use it to predict that of the population to a 
reasonable level of confidence. 


Biased Sample 


biased samples 


Sometimes samples can be biased 

Not every sample is fair. Unless you’re very careful, some sort of bias can creep 
in to the sample, which can distort your results. Bias is a sort of favoritism that 
you can unwittingly (or maybe knowingly) introduce into your sample, meaning 
that your sample is no longer randomly selected from your population 

If a sample is unbiased ， then it’s representative of the population. It’s a fair 
reflection of what the population is like. 



A biased sample is not representative of 
the target population. We can’t use it to make 
inferences about the population because 
the sample and population have different 
characteristics. If we try to predict the shape 
of the population distribution from that of 
the sample, we’d end up with the wrong 
result. 


A - 

AouonbaJI 
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using statistical sampling 



This sounds hopeless. How can I be certain I 
avoid bias? Where does it come from anyway? 


Sources of bias 


So how does bias creep into samples? Through any of the following and more: 



A sampling frame where items have been left off, such that 
not everything in the target population is included. If it’s not in your 
sampling frame, it won’t be in your sample. 



An incorrect sampling unit. Instead of individual gumballs, maybe 
the sampling unit should have been boxes of gumballs instead. 



Individual sampling units you chose for your sample weren’t 
included in your actual sample. As an example, you might send 
out a questionnaire that not everybody responds to. 



Poorly designed questions in a questionnaire. Design your 
questions so that they’re neutral and everyone can answer them. An 
example of a biased question is “Mighty Gumball candy is tastier than 
any other brand, do you agree?” It would be better to ask the person 
being surveyed for the name of their favorite brand of confectionary. 



Samples that aren’t random. As an example, if you’re conducting 
a survey on the street, you may avoid questioning anyone that looks too 
busy to stop, or too aggressive. This means that you exclude aggressive 
or busy-looking people from your survey. 




As you can see, there are lots of sources of bias, and a lot 
of it comes down to how you choose your sample. 

We need to take a look at ways in which you can choose your sample to minimize 
the chances of introducing bias. 
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there are no dumb questions 


So is the sampling frame a list of 
everything that we’re sampling? 

The sampling frame lists all the 
individual units in the population, and it’s 
used as a basis for the sample. It's not the 
sample itself, as we don’t sample everything 
on it. 

How do I put together the sampling 
frame? 

How you do it and what you use 
depends on your target population. As an 
example, if your target population is all car 
owners, then you can use a list of registered 
car owners. If your target population is all 
the students attending a particular college, 
you can use the college registrar. 

How about things like telephone 
listings? Can I use those for my sampling 
frame? 

It all depends on your target population. 
Telephone listings exclude households 
without a telephone, and there may also 
be households who have elected not to be 
listed. If your target population is households 
with a listed telephone number, then using 
telephone lists is a good idea. If your target 
population is all households with a telephone 
or even all telephones, then your sampling 
frame won’t be entirely accurate—and that 
can introduce bias. 



Can I always compile a sampling 
frame? 

Not always. Imagine if you had to 
survey all the fish in the sea. It would be 
impossible to name and number every 
individual fish. 

Will I always have to have a target 
population? 

Yes. You need to know what your 
target population is so that you can make 
sure your sample is representative of it. 
Thinking carefully about what your target 
population is can help you avoid bias. 

If you’re sampling for someone else, get 
as much detail as possible about who the 
target population should be. Make sure you 
know exactly what is included and what is 
excluded. 


Why is bias so bad? 


Bias is bad because it can mislead 
you into drawing wrong conclusions about 
your target population, which in turn can 
lead you into making wrong decisions. If, for 
example, you only sampled pink gumballs, 
your survey results might be accurate for 
all pink gumballs, but not for all gumballs in 
general. There may be significant differences 
between the different color gumballs. 


How can the questions in a 
questionnaire cause bias? 

Bias often creeps in through the 
phrasing of questions. 

First off, if you present a series of statements 
and ask respondents to agree or disagree, 
it’s more likely that people will agree unless 
they have strong negative feelings. This 
means that the results of your survey will be 
biased towards people agreeing. 

Bias can also occur if you give a set of 
possible answers that don’t cover all 
eventualities. As an example, imagine you 
need to ask people how often they exercise 
in a typical week. You would introduce bias 
if you give answers such as “more than 5 
times a week,” “3-5 times a week," “1-2 
times a week,” and “I don’t value my health, 
so I don’t exercise." Someone may not 
exercise, but disagree with the statement 
that they don’t value their health. This would 
mean that they wouldn’t be able to answer 
the question. 
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— your pencil 


Look at the following scenarios. What would you choose as a 
target population? What’s the sampling unit? How would you 
develop a sampling frame? What other things might you need to 
consider when forming your sample? 


1. Choc-O-Holic Inc. manufactures chocolates, and they have just finished a limited 
edition run of chocolates for the holiday season. They want to check the quality of 
those chocolates. 


2. The Statsville Health Club wants to conduct a survey to see what their customers 
think of their facilities. 
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sharpen your pencil solution 



Look at the following scenarios. What would you choose as a 
target population? What’s the sampling unit? How would you 
develop a sampling frame? What other things might you need to 
consider when forming your sample? 


1. Choc-O-Holic Inc. manufactures chocolates, and they have just finished a limited- 
edition run of chocolates for the holiday season. They want to check the quality of 
those chocolates. 

The fof>uld*tio)r\ is dll ^hodola*tcs -the edition \ruh. 

The \ay\\{, is ov\t dhodola*tc. 

The -fv-amc ir\ccds -to dovcv- dll dhodola*tcs ； ds its d limrted — edition \ruh i*t’s possible 

Chod—0—ttolid hds \rcdo\rds o-f how ma^y dhodola*tcs a\rc m \rim, 'mdludi^ ir\uw\bc\rs o( eddh *typc o-f 
dhodola*tc- 

-the sample, V ou Y\ttd *to mdke su\rc 七••七 is \rcf\rcsc^*ta*tivc o-f *ta\r^c*t population d^d 
unbiased. I*f a\rc di-f-fc\rc^*t *typcs o( dhodola*tc m \rim, you’d r\ttd *to make sure 七 you 'mdluded 

eddh so\rt dhodola*tc- 


2. The Statsville Health Club wants to conduct a survey to see what their customers 
think of their facilities. 

The *ta\rjc*t popula*tio 灼 is all *tiic ^us*tomc\rs o( 七 he S*t3*tsvi||c Club. 

The sampling umit is oy\C ^us*bomc\r. 

The sampl'mg -fvamc 的 ceds *to ^ovcv- all o-f -the dus*bomc\rs. |*t’s likely 七七 he health dub has a lis*t o-f 
v-cjis-tcv-cd ^us*tomc\rs, so you 匕 ould use as *thc sampling -fvarwC- 

As bc-fo\rc, you r\ttA *to make su\rc you\r sample is \rcp\rcsci^*ta*tivc o-f -the population u^bidsed- Y^i/d 
r\ttd *to make suv-c o-f -the classes is -faiv-ly \rcp\rcsc^*tcd by 乙 us*tomcv~ jc^dcv-, 匕 us*tomc\r youp, 

so or\- 
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using statistical sampling 


Solved: The Case of the Lost Coffee Sales 


Why didn’t the coffee sell "well? 

We don’t know for certain, but there’s a very good chance that the 
sample of people surveyed by the intern wasn’t 
representative of the target population. 

First of all, the intern was looking forward to giving 
away free coffee to friendly Starbuzz customers and 
hearing positive things. Does this mean he only spoke 
to customers who looked friendly to him? Did he get their 
real opinions about the coffee, or did he only ask them whether they 
agreed it tasted nice? 



Five JVtlnufe 
Mystery 
Sajvea 


The intern also hoped to use the job as an opportunity to speak to a girl 
at his local coffee shop. Did he spend most of his time in this particular 
coffee shop? Did the girl influence his sample choice? 


Finally, the CEO launched the new coffee in a different season from the 
one in which the survey took place, and this may have affected sales too. 


Any or all of these factors could have lead to the sample being 
misleading, which in turn led to the wrong decision being made. 
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How to choose your sample 

We’ve looked at how to design your sample and explored types of bias that need 
to be avoided. Now we need to select our actual sample from the sample frame. 
But how should we go about this? 

Simple random sampling 

One option is to choose the sample at random. Imagine you have a 
population of N sampling units, and you need to pick a sample of n 
sampling units. Simple random sampling is where you choose a 
sample of n using some random process, and all possible samples of size 
n are equally likely to be selected. 


With simple random sampling, you have two options. You can either 
sample with replacement or without replacement. 


Sampling with replacement 

Sampling with replacement means that when you’ve selected each 
unit and recorded relevant information about it, you put it back into the 
population. By doing this, there’s a chance that a sampling unit might 
be chosen more than once. You’d be sampling with replacement if you 
decided to question people on the street at random without checking 
if you had already questioned them before. If you stop a person for 
questioning and then let them go once you’ve finished asking them 
questions, you are in effect releasing them back into the population. It 
means that you may question them more than once. 


Sampling without replacement 

Sampling without replacement means that the sampling unit isn’t 
replaced back into the population. An example of this is the gumball 
taste test; you wouldn’t want to put gumballs that have been tasted back 
into the population. 



You 
so *bW»s ^ 


v/ouia be s—C v-a^dom 
Y/ •，七 W 七 


430 


Chapter 10 


using statistical sampling 


How to choose a simple random sample 


There are two main ways of using simple random sampling: by drawing lots or 
using random numbers. 


drawing lots 

Drawing lots is just like pulling names out of a hat. You write the 
name or number of each member of the sampling frame on a piece 
of paper or ball, and then place them all into a container. You then 
draw out n names or numbers at random so that you have enough 
for your sample. 


Random number generators 



If you have a large sampling frame, drawing lots might not be 
practical, so an alternative is to use a random number generator, 
or random number tables. For this, you give each member of the 
sampling frame a number, generate a set of n random numbers, 
and then pick the members of the set whose assigned numbers 
correspond to the random numbers that were generated. 

It’s important to make sure that each number has an equal chance 
of occuring so that there’s no bias. 






Simple random sampling isn’t without its problems. What do you think 
could go wrong with it? 
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stratified, cluster, and systematic sampling 


There are other types of sampling 

Even simple random sampling has its problems. 

With simple random sampling, there’s still a chance that your sample will not 
represent the target population. For example, you might end up randomly 
drawing only yellow gumballs for your sample, and the other colors would be left 
out. 

So how can we avoid this? 


Wc can use stratified sampling 


An alternative to simple random sampling is stratified sampling. With this type of 
sampling, the population is split into similar groups that share similar characteristics. 
These charateristics or groups are called strata, and each individual group is called 
a stratum. As an example, we could split up the gumballs into the different colors, 
yellow, green, red, and pink, so that each color forms a different stratum. 


Once you’ve done this, you can perform simple random sampling on each stratum to 
ensure that each group is represented in your overall sample. To do this, look at the 
proportions of each stratum within the overall population and take a proportionate 
number from each. As an example, if 50% of the gumballs that Mighty Gumball 
produce are red, half of your sample should consist of red gumballs. 


dolov a 
sc?ava 七 c 士 ra 七 ，. 
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using statistical sampling 


...or we can use cluster sampling... 

Cluster sampling is useful if the population has a number of similar 
groups or clusters. As an example, gumballs might be sold in packets, 
with each packet containing a similar number of gumballs with similar 
colors. Each packet would form a cluster. 

With cluster sampling, instead of taking a simple random sample of 
units, you draw a simple random sample of clusters, and then 
survey everything within each of these clusters. As an example, you 
could take a simple random sample of packets of gumballs, and then 
taste all the gumballs in these packets. 

Cluster sampling works because each cluster is similar to the others, 
and an added advantage is that you don’t need a sampling frame of 
the whole population in order to achieve it. As an example, if you were 
surveying trees and used particular forests as your cluster, you would 
only need to know about each tree within only the forests you’d selected. 

The problem with cluster sampling is that it might not be entirely 
random. As an example, it’s likely that all of the gumballs in a packet 
will have been produced by the same factory. If there are differences 
between the factories, you may not pick these up. 





or eveyi systematic sampling 


With systematic sampling, you list the population in some sort of order, 
and then survey every Ath item, where k is some number. As an example, 
you could choose to sample every 10th gumball. 

Systematic sampling is relatively quick and easy, but there’s one key 
disadvantage. If there’s some sort of cyclic pattern in the population, 
your sample will be biased. As an example, if gumballs are produced 
such that every 10th gumball is red, you will end up only sampling red 
gumballs, and this could lead to you drawing misleading conclusions 
about your population. 



^ ou ^ gumball {o yi a 〒七加此匕 sample. 


o^.o.|aooD.o # QO^ 
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there are no dumb questions 


thereicire no o 

Dumb Questions 


Does using one of these methods 
of sampling guarantee that the sample 
won’t be biased? 

They don’t guarantee that the sample 
won’t be biased, but they do minimize the 
chances of this happening. By really thinking 
about your target population and how you 
can make your sample representative of it, 
you stand a much better chance of coming 
up with an unbiased, representative sample. 

Do 丨 have to use any of these 
methods? Can’t I just choose items at 
random. 

Choosing items at random is simple 
random sampling. Yes, this is one approach 
you can take, but one thing to be aware of is 
that there is a chance your sample will not 
be representative of the population at large. 

But why? Surely if I choose items 
at random, then my sample is bound to 
be representative of the target population. 

Not necessarily. You see, if you choose 
sampling units at random, then there’s a 
chance that purely at random, you could 
choose a sample that doesn’t effectively 
represent the target population. As an 
example, if you choose customers of the 
Statsville Health Club completely at random, 
there's a chance that you might choose only 
attendees of one particular class, or of one 
particular gender. 


There might also be a case where you think 
you’re sampling at random, when really 
you’re not. As an example, if you conduct a 
survey to find out customer satisfaction, but 
leave it up to customers whether or not they 
respond, you may well end up with a biased 
sample as customers have to be sufficiently 
motivated to respond. The customers who 
are most motivated to take part in the survey 
will be those who are either strongly satisfied 
or strongly dissatisfied. You are less likely to 
hear from those customers without strong 
feelings, yet those people may make up the 
bulk of the population. 

How about if I just increase the size 
of my sample? Will that get around bias? 

The larger your sample, the less 
chance there is of your sample being 
biased, and this is one way of minimizing the 
chances of getting a biased sample using 
simple random sampling. The trouble is, the 
larger your sample, the more cumbersome 
and time-consuming it can be to gather data. 

What’s the difference between 
stratified sampling and clustered 
sampling? 

With stratified sampling, you divide 
the population into different groups or strata, 
where all the units within a stratum are as 
similar to each other as possible. In other 
words, you take some characteristic or 
property such as gender, and use this as the 
basis for the strata. Once you’ve split the 
population into strata, you perform simple 
random sampling on each stratum. 


With clustered sampling, your aim is to 
divide the population into clusters, trying to 
make the clusters as alike as possible. You 
then use simple random sampling to choose 
clusters, and then sample everything in 
those clusters. 

I see. So with stratified sampling, 
you make each stratum as different as 
possible, and with clustered sampling, 
you make each cluster as similar as 
possible. 

Exactly. 

So what about systematic 
sampling? 

With systematic sampling, you choose 
a number, k, and then choose every /(th 
item for your sample. This way of sampling 
is fairly quick and easy, but it doesn’t mean 
that your sample will be representative of 
the population. In fact, this sort of sampling 
can only be used effectively if there are no 
repetitive patterns or organization in the 
sampling frame 

Drawing lots sounds antiquated. 

Do people still do that? 

It’s not as common as it used to be, 
but it’s still a way of sampling. 
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E%eftci$e 


You've been given 10 boxes of chocolates and been asked to sample the chocolates in them. 
There are whilte, milk, and dark chocolates in the boxes. Your target population is all of the 
chocolates, and the sampling unit is one chocolate. 


1. How could you apply simple random sampling to this problem? 


2. How could you apply stratified sampling? 


3. What about cluster sampling? 
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exercise solution 



You've been given 10 boxes of chocolates and been asked to sample the chocolates in them. 
There are whilte, milk, and dark chocolates in the boxes. Your target population is all of the 
chocolates, and the sampling unit is one chocolate. 


1. How could you apply simple random sampling to this problem? 


You 匕 ould apply simple random by dhodola*tcs a*t ci*thc\r *tWou^h 

lo*U o\r usrng ^umbc\rs. Tha*t v/3y> ca 匕 h ^hodola*tc s*bd^(Js bemg sampled- 


2. How could you apply stratified sampling? 

Fo\r s*t\ra*ti-ficd you divide dhodola*tcs *m*to s*t\ra*ta apply simple random samplmj *to 

eddh cmc. Eddh s*t\ra*ta dompv-iscs o-f 3 ^\rouf <^f dhodola*tcs W\{\\ similar dha\rad*tc\ris*tids, so you dould 
use *thc di-f-fc\rc^*t types o( dhodola*tc- Ov\t s-tv-a-tum dould be y/lii*tc dhodola*tcs, a^o*thc\r ov\t dould be 
milk dhodola*tcs, d^d *thc -fmal oy\C 匕 ould be dd\rk dhodola*tcs. 


3. What about cluster sampling? 

Fov dlus*tc\r samplmj, you divide *tiic dhodola*tcs *m*to jvoups, bu 七 *this eddh youp needs *to be 

similar. f[ss[AW\'\Y\^ eddh bo 乂 c^f dhodola*tcs is similar, you dould *takc oy\c of *tiic boxes, d^d sample dll of 
dhodola*tcs \y\ it 
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How would you go about conducting a sample survey of Mighty Gumball's super-long-lasting 
gumballs? The gumballs come in four different colors, and they’re all made in the same factory. 
Assume you have to start your sample from scratch. 
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exercise solution 



How would you go about conducting a sample survey of Mighty Gumball's super-long-lasting 
gumballs? The gumballs come in four different colors, and they’re all made in the same factory. 
Assume you have to start your sample from scratch. 


The tavyt population is all o-f 与 umball’s supcv—Ilastmg jumbalU ; -the sampling uni 七 is a 的 

'mdividudl ^umbdlI- Fo\r *thc sampling -fvamc, v/C ideally Y\ttA some soV*t i^umbc\rcd lis*t o-f *tlic ^umbdlls ； bu 七 
its likely \sy\{, p\rad*tidal- I^s-teddl ； v/cll settle -fo\r d lis*t shoy/mg how ma^y gumballs *thc\rc a\rc m 

population -fo\r dolo\r. 

The *tvpc o-f samplmg you use is subjective, bu 七 we’d dhoosc *to use s*t\ra*ti-ficd sdmpl'm^ as *tKis may be 七 he bcs*t 
way o-r domrn^ up with Sy\ unbiased sample- VJtd divide *tiic gumballs *m*to *thci\r di-frc\rc^*t dolov-s use 

simple \ra^dow\ sampling *to choose a p\ropo\rtioi^a*tc i^umbcv- ot o-f *thc -fou\r Golovs. IVlc would *t^C^ use "these 
to\t ou\r sample- 

Do^*t y/o\r\ry i-f you ^o*t 3 a^swev-. The key is *to *tiVmk *thv-ou5h how you dan bcs*t mdke you\r 

survey \rcp\rcsc^*t3*tivc *t^c population, d^d you may have di-f-fc\rc^*t ideds. 


BULLET POINTS —— 

■ A population is the entire collection 
of things you are studying. 

■ A sample is a relatively small 
selection taken from the population 
that you can use to draw conclusions 
about the population itself. 

■ To take a sample, start off by defining 
your target population, the population 
you want to study. Then decide on 
your sampling units, the sorts of 
things you need to sample. Once 
you’ve done that, draw up a sampling 
frame, a list of all the sampling units 
in your target population. 

■ A sample is biased if it isn’t 
representative of your target 
population. 

■ Simple random sampling is where 
you choose sampling units at random 
to form your sample. This can be 


with or without replacement. You can 
perform simple random sampling by 
drawing lots or using random number 
generators. 

■ Stratified sampling is where you 
divide the population into groups of 
similar units or strata. Each stratum 
is as different from the others as 
possible. Once you've done this, you 
perform simple random sampling 
within each stratum. 

■ Cluster sampling is where you 
divide the population into clusters 
where each cluster is as similar 
to the others as possible. You use 
simple random sampling to choose 
a selection of clusters. You then 
sample every unit in these clusters. 

■ Systematic sampling is where you 
choose a number, k, and sample 
every /(th unit. 
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Mighty frumball has a sample 


With your help, Mighty Gumball has gathered a sample of their 
super-long-lasting gumballs. This means that rather than perform 
taste tests on the entire gumball population, they can use their 
sample instead. 



Thafs great! It 
means well save time, 
money, and teeth. 


So what's next? 

We’ve looked at how we can put together a representative sample, 
but what we haven’t looked at is how we can use it. We know that 
an unbiased sample shares the same characteristics as its parent 
population, but what’s the best way of analyzing this? 

Keep reading, and we’ll show you how in the next chapter. 
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11 estinicitlng populcitlons and samples 


+ Making Predictions ♦ 



Wouldn’t it be great if you could tell what a population was 
like, just by taking one sample? 

Before you can claim full sample mastery, you need to know how to use your samples 
to best effect once you’ve collected them. This means using them to accurately predict 
what the population will be like and coming up with a way of saying how reliable your 
predictions are. In this chapter, we’ll show you how knowing your sample helps you get to 

know your population, and vice versa. 


this is a new chapter 
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making estimates using a sample 


So how long does flavor really last for? 

With your help, Mighty Gumball has pulled together an unbiased 
sample of super-long-lasting gumballs. They’ve tested each of the 
gumballs in the sample and collected lots of data about how long 
gumball flavor within the sample lasts. 

There’s just one problem... 



I don’t care how long flavor lasts in the sample. 
What I do care about is flavor duration in the 
population. That way, I can say how much longer our 
gumballs last than the competing brand. 



To satisfy the GEO, we’re going to need to find both the mean 
and the variance of flavor duration in the whole Mighty Gumball 
population. 

Here’s the data we gathered from the sample. How do you think 
we can use it to tell us what the mean of the population is? 


^ how Ua 


61.9 62.6 63.3 64.8 65.1 




66.4 67-1 67-2 68.7 69.9 





flighty ^umbairs pugilists CBO 



Take a look at the data. How would you use this data to estimate the mean 
and variance of the population? How reliable do you think your estimate will 
be? Why? 
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estimating populations and samples 


Lcfs start by estimating the population mean 


So how can we use the results of the sample taste test to tell us the mean 
amount of time gumball flavor lasts for in the general gumball population? 


The answer is actually pretty intuitive. We assume that the mean flavor 
duration of the gumballs in the sample matches that of the population. In 
other words, we find the mean of the sample and use it as the mean for the 
population too. 

Here’s a sketch showing the distribution of the sample, and what you’d 
expect the distribution of the population to look like based on the sample. 
You’d expect the distribution of the population to be a similar shape to 
that of the sample, so you can assume that the mean of the sample and 
population have about the same value. 


八 
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Sample vs. Population 


Popubiio, 





— > 

flavor duration 


( 5o are you saying that ] 
the mean of the sample 
( exactly matches the mean 1 
of the population? J 

We can’t say that they exactly match, but 
it’s the best estimate we can make. 

Based on what we know, the mean of the sample is the best 
estimate we can make for the mean of the population. It’s 
the most likely value for the population mean that we can 
come up with based on the information that we have. 

The mean of the sample is called a point estimator for 
the population mean. In other words, it’s a calculation 
based on the sample data that provides a good estimate for 
the mean of the population. 
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Point estimators caw approximate population parameters 

Up until now, we’ve been dealing with actual values of population parameters 

such as the mean, ja, or the variance, a . We’ve either been able to calculate 
these for ourselves, or we’ve been told what they are. 

This time around we don’t know the exact value of the population parameters. 

Instead of calculating them using the population, we estimate them using 
the sample data instead. To do this, we use point estimators to come up with a 
best guess of the population parameters. 

A point estimator of a population parameter is some function or calculation 
that can be used to estimate the value of the population parameter. As an 
example, the point estimator of the population mean is the mean of the sample, 
as we can use the sample mean to estimate the population mean. 



Po'mt cstima-bov-s use 



… "to wtiwte ihe-^) 
popula-tioh 



We differentiate between an actual population parameter and its point 
estimator using the A symbol. As an example, we use the symbol ja to represent 
the population mean, and jl to represent its estimator. So to show that you’re 
dealing with the point estimator of a particular population parameter, take the 
symbol of the population parameter, and top it with a A . 



The poiht the 

popubtioh mcah looks like the 
‘1(, it s -topped with 


w'cah 
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your pencil 


It occurs to me that we have a 
symbol for the population mean and 
one for its point estimator. Is there 
a symbol for the sample mean too? 


There’s a shorthand way of writing the sample mean. 

The symbol ja has a very precise meaning. It’s the mean of the population. 
We have a different way to represent the mean of the sample so that we 
don’t get confused about which mean we’re talking about. To represent the 
sample mean, we use the symbol x (pronounced “x bar”). That way, we 
know that if someone refers to they’re referring to the population mean, 
and if they refer to x, they’re referring to the sample mean. 

x is the sample equivalent of [i, and you calculate it in the same way you 
would the population mean. You add together all the data in your sample, 
and then divide by however many items there are. In other words, if your 
sample size is n, 

A ■叉 If ir 一 从 a ln.uwbc'T 

^ is ihc mcah of ’ x ■乏 x rn sample, d^At 

仏 e sample. avc. 


We can use this to write a shorthand expression for the point estimator for 
the population. Since we can estimate the population mean using the mean 
of the sample, this means that 




>|J = X 


…必岣 七栋 c sa 一 e. 


Use the sample data to estimate the value of the population 
mean. Here’s a reminder of the data: 


61.9 62.6 63.3 64.8 65.1 66.4 67-1 67-2 68.7 69.9 


you are here ► 445 










solutions and questions 


智 y °S£ 


Use the sample data to estimate the value of the population 
mean. Here’s a reminder of the data: 


61.9 62.6 63.3 64.8 65.1 66.4 67-1 67.2 68.7 69.9 

IVc 匕 a 的 estimate population by -the <^f -the sample- 

P n ^ n i>l^ + + ^3 + + ^ \ + + ^.1 + m + + ^>%°i 


10 


^l/lo 



Dumb Quest! 


9 ns 


Surely the mean is just the mean. 
Why are there so many different symbols 
for it? 

There are three different concepts at 
work. There’s the mean of the population, 
the mean of the sample, and the point 
estimator for the population mean. 

The population mean is represented 
by p. This is the sort of mean that we’ve 
encountered throughout the book so far, and 
you find it by adding together all the data in 
the population and dividing by the size of the 
population. 

The sample mean is represented by x. 

You find it in the same way that you find p, 
except that this time your data comes from 
a sample. To calculate x, you add together 
the data in your sample, and divide by the 
size of it. 


The point estimator for |j is represented by 
|j. It’s effectively a best guess for what you 
think the population mean is, based on the 
sample data. 


So does that mean that we can find 
|j by just taking the mean of a sample? 

We can’t find the exact value of 
jj using a sample, but if the sample is 
unbiased, it gives us a very good estimate. 

In other words, we can use the sample data 
to find [j, not the true value of |j itself. 


But what about if the sample is 


biased? How do we come up with an 
estimate for |j then? 


This is where it's important to make 
your sample as unbiased as possible. If all 
the data you have comes from your sample, 
then that’s what you need to use as the 
basis for your estimate. If your sample is 
biased, then this means that your estimate 
for |j is likely to be inaccurate, and it may 
lead you into making wrong decisions. 


Does the size of the sample matter? 


In general, the larger the size of 
your sample, the more accurate your point 
estimator is likely to be. 


[} is tke 


mean 


ol tke 


population, x is tke 
mean ol tke sample ， 
and p is tke point 
estimator lor |J. 
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estimating populations and samples 


BULLET POINTS - 

■ A point estimator is an estimate for the value of a 
population parameter, derived from sample data. 

■ The A symbol is added to the population parameter 
when you’re talking about its point estimator. As an 
example, the point estimator for |j is (j. 

■ The mean of a sample is represented as x. To find the 
mean of the sample, use the formula 

x = Zx 
n 


■ The point estimator for the population mean is found by 
calculating x. In other words, 

A - 

\} = X 

This means that if you want a good estimate for the true 
value of the population mean, you can use the mean of 
the sample. 


where x represents the values in the sample, and n is 
the sample size. 


This looks great! We can use your 
work in our television commercials to say 
how long gumball flavor lasts for, and it 
beats our main rival, hands down. Just one 
question ： how much variation do you expect 
there to be? 


You’ve come up with a good estimate 
for the population mean, but what 
about the variance? 

If we can come up with a good estimate for the 
population variance, then the GEO will be able to 
tell how much variation in flavor duration there’s 
likely to be in the gumball population, based on 
the results of the sample data. 
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point estimator for population variance 


Lefs estimate the population variance 

So far we’ve seen how we can use the sample mean to estimate the mean 
of the population. This means that we have a way of estimating what the 
mean flavor duration is for the super-long-lasting gumball population. 

To satisfy the Mighty Gumball CEO, we also need to come up with a good 
estimate for the population variance. 

So what can we use as a point estimator for the population variance? In 
other words, how can we use the sample data to find G 2 ? 



Thafs easy. The variance of the 
sample is bound to be the same 
as that of the population. We can 
use the sample variance to estimate 
the population variance. 


The variance of the data in the sample may not be 
the best way of estimating the population variance. 

You already know that the variance of a set of data measures the way in 
which values are dispersed from the mean. When you choose a sample, 
you have a smaller number of values than with the population, and since 
you have fewer values, there’s a good chance they’re more clustered 
around the mean than they would be in the population. More extreme 
values are less likely to be in your sample, as there are generally fewer of 
them. 
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So what would be a better estimate of the population variance? 
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Wc need a different point estimator thaw sample variance 

The problem with using the sample variance to estimate that of the population 
is that it tends to be slightly too low. The sample variance tends to be slightly 
less than the variance of the population, and the degree to which this holds 
depends on the number of values in the sample. If the number in the sample 
is small, there’s likely to be a bigger difference between the sample and 
population variances than if the size of the sample is large. 

What we need is a better way of estimating the variance of the population, 
some function of the sample data that gives a slightly higher result than the 
variance of all the values in the sample. 


So what ]s the estimator? 

Rather than take the variance of all the data in the sample to estimate the 
population variance, there’s something else we can use instead. If the size of 
the sample is n, we can estimate the population variance using 


Es'tirna'tov* *fo\r "the / CT^ 
populatioh 


_ Take cadK \it^ m sample, s^braci sample 

■ X)2 s^uav-c vesul 七 , 七 add 七 k I 。七 


n - 


Uvi 






In other words, we take each item in the sample, subtract the sample mean, and 
then square the result. We then add all of the results together, and divide by the 
number of items in the sample minus 1. This is just like finding the variance of 
the values in the sample, but dividing hy n — \ instead of n. 



This formula is a closer match to the value of the 
population variance. 

Dividing a set of numbers hy n — \ gives a higher result than 
dividing by n, and this difference is most noticeable when n is fairly 
small. This means that the formula is similar to the variance of the 
sample data, but gives a slightly higher result. 

The population variance tends to be higher than the variance of 
the data in the sample. This means that this formula is a slightly 
better point estimator for the population variance. 
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variance in depth 


jO Variance Tip C]aSe - 

Knowing what formula you should use to find the variance can be confusing. 
There’s one formula for population variance a 2 , and a slightly different one 
for its point estimator a 2 . So which formula should you use when? 

Population variance 

If you want to find the exact variance of a population and you have data for 
the whole population, use 


Populatioh ^ (J 2 = 


Z(X ■ ||) 2 ^ — Population mea” 


n — Szjc o( *tV>c population 


In this situation, you have all the data for your population. You know what 
the mean is for your population, and you want to find the variance of all of 
these values. This is the calculation that you’ve seen throughout this book 
so far. 


Estimating the population variance 

If you need to estimate the variance of a population using sample data, use 

Poiht csti^aW -fov- Ja 2 = Z(x - x ) 2 ^ 

P 0 P ubt，0h V ㈣ 心 / - ” - UoU is ^ s •• 这 oUk 

based Oh you^ sample. n ■ 1 4^ 〆 sa 呷 l e . TWis A a” estimate- 


Instead of calculating the variance of an actual population of n values, you 
have to estimate the variance of the population, based on the sample of data 
you have. To make you estimate a bit more accurate, you divide hy n - \ 
instead of n, as this gives a slightly higher result. 

The formula for the population variance point estimator is usually written 


s , so 


Poiivt cs-tima-tov- 
-fov- the 
fopulatioh 
vav-iah^c 



a 2 = s 2 


where 


s 2 


z(x - 即 ^ 2 ted o, 

i~ sa^lc data 


n - 


This is similar to using x to represent the sample mean. 
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Which formula's which? 

Sometimes it can be tricky deciding whether you should 
divide by n for the variance, or whether you should divide by 
n — 1. The golden rule to remember is that dividing by n gives 

you the actual variance for the set of data that you 
have. 

If you have the data for the entire population, then dividing 
by n gives you the actual variance of the population. You 

need to use the formula for a 2 and divide by n. 


If you have a sample of data from the population, then 
chances are you’ll want to use this to estimate the variance of 
the population. This means that you need to use the formula 

for s 2 and divide by n - 1. 



Some books tell you to 
divide by n - 1 for a 
sample, and some tell 
you to divide by n. 


This is because different 
books make different assumptions about 
what you’re using your sample for. If 
you J re using the sample to estimate the 
population variance, then you need to 
divide by n - 1. You only need to divide 
by n if you want to calculate the variance 
of that exact set of values. 


If you’re taking a statistics exam, check 
the approach that your exam board takes. 


^harpen your pencil 


Here's a reminder of the data from the Mighty Gumball sample. 
What do you estimate the population variance to be? 


61.9 62.6 63.3 64.8 65.1 66.4 67-1 67-2 68.7 69.9 
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more solutions and questions 


61.9 62.6 


Here’s a reminder of the data from the Mighty Gumball sample. 
What do you estimate the population variance to be? 

63.3 64.8 65.1 66.4 67-1 67-2 68.7 69.9 
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Why do I divide by n -1 for the 
sample variance? Why can’t I divide by 
n? 

You divide by n - 1 for a sample 
because most of the time, you use your 
sample data to estimate the variance of the 
population. Dividing by n - 1 gives you a 
slightly more accurate result than dividing by 
n. This is because the variance of values in 
the sample is likely to be slightly lower than 
the population variance. 



Is there some mathematical basis 
for this? 

Yes there is. It’s something that we’re 
going to touch upon at the end of the chapter, 
but hold onto that thought; it’s a good one. 

How do I remember which symbols 
are used for the population, and which 
are used for the sample? 

In general, Greek letters are used for 
the population, and normal Roman letters 
are used for the mean and variance for the 
sample. 


Is there a point estimator for the 
standard deviation in the same way that 
there is for the variance? How do I find it? 

If you need to estimate the standard 
deviation, start by calculating the estimator 
for the variance. The estimator for the 
standard deviation is the square root of this. 
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Mighty frumball has done more sampling 

The Mighty Gumball GEO is so inspired by the results of the taste test that 
he’s asked for another sampling exercise that he can use for his television 
advertisements. This time, the GEO wants to be able to say how popular 
Mighty Gumball’s candy is compared with that of their main rival. 

The Mighty Gumball staff have asked a random sample of people whether 
they prefer gumballs produced by Mighty Gumball or whether they prefer 
those of their main rival. They’re hoping they can use the results to predict 
what proportion of the population is likely to prefer Mighty Gumball. 



k 4 


Mighty Gumball has found that in a sample of 40 people, 32 of them prefer 
their gumballs. The other 8 prefer those of their rival. 





How would you find the proportion of people in the sample who prefer Mighty 
Gumball’s candy? What distribution do you think this follows? How do you think 
you could apply this to the population? 
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Ifs a question of proportion 

For the latest Mighty Gumball sample, the thing the GEO is interested in is 
whether or not each person prefers Mighty Gumball confectionery to that of 
their chief rival. In other words, every person who prefers Mighty Gumball 
candy can be classified as a success. 

So how do we use the sample data to predict the proportion of successes in 
the population? 

Predicting population proportion 

If we use X to represent the number of successes in the population, then X 
follows a binomial distribution with parameters n and/?, n is the number of 
people in the population, and p is the proportion of successes. 

In the same way that our best estimate of the population mean is the 
mean of the sample, our best guess for the proportion of successes in the 
population has to be the proportion of successes in the sample. This means 
that if we can find the proportion of people in the sample who prefer Mighty 
Gumball’s treats, we’ll have a good estimate for the proportion of people who 
prefer Mighty Gumball in the general population. 

We can find the proportion of successes in the sample by taking the total 
number of people who prefer Mighty Gumball, and then dividing by 

the total number of people in the sample. If we use p § to represent the 
proportion of successes in the sample, then we can estimate the proportion of 
successes in the population using 

八 / — Propov*tioy\ o^c suttcsscs m 

^ P = Ps 

where 

p s = number of successes 
number in sample 



Pp 七 cs-tima-tov -poir -the pvopov-tioh 
o+ sUUsscs ih -the population 


In other words, we can use the proportion of successes in the sample as a 
point estimator for the proportion of successes in the population. In the case 
of the company’s latest sample, 32 out of 40 people prefer Mighty Gumball 

confectionery, which means that p § = 0.8. Therefore, the point estimator for 
the proportion of successes in the population is also 0.8. 
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So am I right in thinking that 
probability and proportion are 
related? They re both represented by 
p y and they sound like they’re similar. 


Probability and proportion are related 

There’s actually a very close relationship between probability and 
proportion. 

Imagine you have a population for which you want to find the 
proportion of successes. To calculate this proportion, you take the 
number of successes, and divide by the size of the population. 

Now suppose you want to calculate the probability of choosing a 
success from the population at random. To derive this probability, 
you take the number of successes in the population, and divide 
by the size of the population. In other words, you derive the 
probability of getting a success in exactly the same way 
as you derive the proportion of successes. 

We use the letter p to represent the probability of success in the 
population, but we could easily use p to represent proportion 
instead — they have the same value. 


p = protatility = proportion 



Mighty Gumball takes another sample of their super-long-lasting 
gumballs, and finds that in the sample, 10 out of 40 people prefer 
the pink gumballs to all other colors. What proportion of people 
prefer pink gumballs in the population? What’s the probability of 
choosing someone from the population who doesn’t prefer pink 
gumballs? 
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yet more solutions and questions 

- ^Sharpen your pencil 

1 Sot 


Mighty Gumball takes another sample of their super-long-lasting 
gumballs, and finds that in the sample, 10 out of 40 people prefer 
the pink gumballs to all other colors. What proportion of people 
prefer pink gumballs in the population? What’s the probability of 
choosing someone from the population who doesn’t prefer pink 
gumballs? 

IVc 匕 estimate population p\ropo\r*ticm sample p\ropov*tioi^ This jives us 

p — — I O/^O 

二 O.Vy 

The p\robabili*ty dhoos'm^ someone population y/ho docs^^-t p\rc-fc\r p'mk gumballs is 

P(P\re*fe\rei^de 的。七 P’mk) — I — p 

— I - O.TJy 
二 0.3 
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Dumb Questions 


So is proportion the same thing as 
probability? 

The proportion is the number of 
successes in your population, divided by 
the size of your population. This is the 
same calculation you would use to calculate 
probability for a binomial distribution. 

Does proportion just apply to the 
binomial distribution? What about other 
probability distributions? 

Out of all the probability distributions 
we’ve covered, the only one which has 
any bearing on proportion is the binomial 
distribution. It’s specific to the sorts of 
problems you have with this distribution. 


Is the proportion of the sample the 
same as the proportion of the population? 

The proportion of the sample can be 
used as a point estimator for the proportion 
of the population. It's effectively a best 
guess as to what the value of the population 
proportion is. 

Is that still the case if the sample is 
biased in some way? How do I estimate 
proportion from a biased sample? 

The key here is to make sure that 
your sample is unbiased, as this is what you 
base your estimate on. If your sample is 
biased, this means that you will come with 
an inaccurate estimate for the population 
proportion. This is the case with other point 
estimators too. 


So how do I make sure my sample 
is unbiased? 

Going through the points we 
raised in the previous chapter is a good 
way of making sure your sample is as 
representative as possible. The hard work 
you put in to preparing your sample is 
worth it because it means that your point 
estimators are a more accurate reflection of 
the population itself. 
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BULLET POINTS —— 

■ The point estimator for the population 
variance is given by 

a 2 = s 2 

where s 2 is given by 

Z(x - x ) 2 

n -1 


■ The point estimator for p is given by p , 

s 

where p is the proportion of successes in the 

s 

sample. 

A 

P = Ps 

■ You calculate p by dividing the number of 
successes in the sample by the size of the 
sample. 


■ The population proportion is represented p = number of successes 

using p. It’s the proportion of successes within number in sample 

the population. 
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finding sample probabilities 


Puy your gumballs here! 

Remember the Statsville Cinema? They’re recently been authorized 
to sell Mighty Gumball products to film-goers, and it’s a move that’s 
proving popular with most of their customers. 

The trouble is, not everybody’s happy. 


I really like red gumballs, 
and rd rather not eat the 
other colors. How many red 
gumballs come in the box? 


iHtroducing new jumbo boxes 

The cinema sells mixed boxes of gumballs, and this weekend, 
they’re putting on a film marathon of classic films. 

The event looks like it’s going to be popular, and tickets are selling 
well. The trouble is, some people get cranky if they don’t get their 
fix of red gumballs. 


A jumbo box of gumballs is meant for sharing, and each box 
contains 100 gumballs. 25% of gumballs in the entire gumball 
population are red. 


I need 40 red gumballs to 
make it through the movie. 

Is that likely? If there 
aren*t enough red gumballs in 
the box, ril get another snack 
instead. 


O 


We need to find the probability that in one 
particular jumbo box, 40 or more of the 
gumballs will be red. 

Since there are 100 gumballs per box, that means we need 
to find the probability that 40% of the gumballs in this box 
are red, given that 25% of the gumball population is red. 


458 Chapter 11 



estimating populations and samples 


So how does this relate to sampling? 


So far, we’ve looked at how to put together an unbiased sample, and 
how to use samples to find point estimators for population parameters. 


This time around, the situation’s different. Here, we’re told what the 
population parameters are, and we have to work out probabilities 
for one particular jumbo box of gumballs. In other words, instead 
of working out probabilities for the population, we need to work out 
probabilities for the sample proportion. 


Isn't that the sort of thing 
that we were doing before? 
What’s the big deal? 




This time, we’re looking for probabilities for a 
sample, not a population. 

Rather than work out the probability of getting particular frequencies 
or values in a probability distribution, this time around we need to find 
probabilities for the sample proportion itself. We need to figure 
out the probability of getting this particular result in this particular box 
of gumballs. 

Before we can work out probabilities for this, we need to figure out the 
probability distribution for the sample proportion. Here’s what we need 
to do: 



Look at all possible samples the same size as the one weVe 
considering. 

If we have a sample of size n, we need to consider all possible samples of 
size n. There are 100 gumballs in the box, so in this case n is 100. 



Look at the distribution formed by all the samples, and 
find the expectation and variance for the proportion. 

Every sample is different, so the proportion of red gumballs in each box 
of gumballs will probably vary. 



Once we know how the proportion is distributed, use it to 
find probabilities. 

Knowing how the proportion of successes in a sample is distributed 
means we can use it to find probabilities for the proportion in a random 
sample — in this case, a jumbo box of gumballs. 


Let’s take a look at how to do this. 
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The sampling distribution of proportions 


So how do we find the distribution of the sample proportions? 

Let’s start with the gumball population. We’ve been told what the 
proportion of red gumballs is in the population, and we can represent 
this as p. In other words, p = 0.25. 



Population ol gumtalls 


2.«7% of ^uwkalls m 

aumbdll art ved, 

SO p 二 O 


Each jumbo box of gumballs is effectively a sample of gumballs taken from the 
population. Each box contains 100 gumballs, so the sample size is 100. Let’s 
represent this with n. 

If we use the random variable X to represent the number of red gumballs in 
the sample, then X 〜 B(n, p), where n = 100 and p = 0.25. 

The proportion of red gumballs in the sample depends on X, the number of 
red gumballs in the sample. This means that the proportion itself is a random 

variable. We can write this as P , where P = X/n 

S 


Sample 



n 



X-B(n, p) 
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There are many possible samples we could have taken of size n. Each 
possible sample would comprise n gumballs, and the number of red 
gumballs in each would follow the same distribution. For each sample, the 
number of red gumballs is distributed as B(n, p), and the proportion of 
successes is given by X/n. 



X-B(n, p) 
P s = X/n 




sa^lc 匕 0 々 ms 
七 0Y\t- 



x-B(n, p) 
P =X/n 

s 


We can form a distribution out of all the sample proportions using all of the 
possible samples. This is called the sampling distribution of proportions, 

or the distribution of P . 

S 


I get it. The sampling distribution of proportions 
is really a probability distribution made up of 
the proportions of all possible samples of size n. 
If we know how the proportions are distributed, 
well be able to use it to find probabilities for the 
proportion of one particular sample. 


Using the sampling distribution of proportions, 
you can find probabilities for the proportion 
of successes in a sample of size n, chosen at 
random. 

This means that we can use it to find the probability that the 
proportion of red gumballs in one particular jumbo box of gumballs 
will be at least 40%. 

But before we can do that, we need to know what the expectation 
and variance is for the distribution. 
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expectation and variance of P s 


So whafs the expectation of P s ? 

So far we’ve seen how we can form a distribution from the proportions of all 
possible samples of size n. Before we can use it to calculate probabilities, we need 
to know more about it. In particular, we need to know what the expectation and 
variance is of the distribution. 

Let’s start with the expectation. Intuitively, we’d expect the proportion of red 
gumballs in the sample to match the proportion of red gumballs in the population. 
If 25% of the gumball population is red, then you’d expect 25% of the gumballs in 
the sample to be red also. 




Ihiui-tivdy, you'd 
the pv-opo\rtio^ o( \rcd 
gumballs bo be ihc same ^ 
both m ihc sample av\d the 
popubiio^. 



So what’s the expectation of P ? 

We want to find E(P s ), where P g = X/n. In other words, we want to find the expected 
value of the sample proportion, where the sample proportion is equal to the number 
of red gumballs divided by the total number of gumballs in the sample. This gives us 


E(P s ) = E 



= E(X) 

n 

Now X is the number of red gumballs in the sample. If we count the number of 
red gumballs as the number of successes, then X 〜 B(n, p). 

You’ve already seen that for a binomial distribution, E(X) = np. This means that 

E(P S ) = ^X) 

n 

= i/p k — 曰 7) - W 

=P 

This result ties in with what we intuitively expect. We can expect the proportion of 
successes in the sample to match the proportion of successes in the population. 
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And whafs the variawee of P s ? 

Before we can find out any probabilities for the sample proportion, we also 
need to know what the variance is for P . We can find the variance in a similar 

S 

way to how we find the expectation. 


So what’s Var(PJ? Let’s start as we did before by using F = X/n. 

Var(PJ = Var /x" 

n 


Var(X) TW»s domes ^ : 

- L *tw»s 6asc, a 二 I’h 


n 


As we’ve said before, X is the number of number of red gumballs in 
the sample. If we count the number of red gumballs as the number of 
successes, then X 〜 B(n, p). This means that Var(X) = npq, as this is the 
variance for the binomial distribution. This gives us 

Var(P s ) = Var(X) 
n 2 

=/pq —- 



=pq 

n 


Taking the square root of the variance gives us the standard deviation of P § , 
and this tells us how far away from/? the sample proportion is likely to be. It’s 
sometimes called the standard error of proportion^ as it tells you what the 
error for the proportion is likely to be in the sample. 


Standard error of proportion = /pq 


n 


The standard error of proportion gets smaller as n gets larger. This means 
that the more items there are in your sample, the more reliable your sample 
proportion becomes as an estimator for/?. 

So how can we use the expectation and variance values we found to calculate 
probabilities for the proportion? Let’s take a look. 


a z Vav(><). 
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Find the distribution of P s 

So far we’ve found the expectation and variance for P § , the sampling 
distribution of proportions. We’ve found that if we form a distribution from 
all the sample proportions, then 

E(P s ) = p Var(P s ) = pq 

n 


We can use this to help us find the probability that the proportion of red 
gumballs in a sample of 100 is at least 40%. 



Right, the distribution of P s actually depends on 
the size of the samples. 

Here’s a sketch of the distribution for P § when n is large. 











Take a look at the sketch for the distribution of P s where n is large. How do you 
think P s is distributed? S 
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P s follows a normal distribution 

When n is large, the distribution of P g becomes approximately normal. 
By large, we mean greater than 30. The larger n gets, the closer the 

distribution of P § gets to the normal distribution. 

We’ve already found the expectation and variance of P g , so this means 
that if n is large, 



As P follows a normal distribution for n > 30, this means that we can use 
the normal distribution to solve our gumball problem. We can use the 
normal distribution to calculate the probability that the proportion of red 
gumballs in a jumbo box of gumballs will be at least 40%. 

There’s just one thing to remember: the sampling distribution needs a 
continuity correction. 



Sometimes 
statisticians 
disagree about 
how large n 
needs to be. 


If you’re taking a statistics exam, 
make sure that you check how 
your exam board defines this. 


P s —continuity correction required 

The number of successes in each of the samples is discrete, and as it’s 
used to calculate proportion, you need to apply a continuity correction 
when you use the normal distribution to find probabilities. 


We’ve seen before that if X represents the number of successes in the 
sample, then P § = X/n. The normal continuity correction for X is ±(1/2). 

If we substitute this in place of X in the formula P § = X/n, this means 
that the continuity correction for is given by 

Continuity correction = ±(1/2) 

n 

=±1 

2n 


In other words, if you use the normal distribution to approximate 

probabilities for P § , make sure you apply a continuity correction of 
±l/2n. The exact continuity correction depends on the value of n. 



If n is very 
large, the 
continuity 
correction can 
be left out. 


As n gets larger, the 
continuity correction becomes very 
small, and this means that it makes 
very little difference to the overall 
probability. Some textbooks omit the 
continuity correction altogether. 
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no dumb questions and bullet points 



What’s a sampling distribution? 

A sampling distribution is what you get 
if you take lots of different samples from a 
single population, all of the same size and 
taken in the same way, and then form a 
distribution out of some key characteristic of 
each sample. This means that the sampling 
distribution of proportions is what you get if 
you form a sampling distribution out of the 
proportions for each of the samples. 

Do we actually have to gather all 
possible samples? 

No, we don’t have to physically form all 
of the samples. Instead we imagine that we 
do, and then come up with expressions for 
the expectation and variance. 


BULLET POINTS 

■ 


So a sampling distribution has an 
expectation and variance? Why? 

A sampling distribution is a probability 
distribution in the same way as any 
other probability distribution, so it has an 
expectation and variance. 

The expectation of the sampling population 
of proportions is like the average value of 
a sample proportion; it’s what you expect 
the proportion of a sample taken from a 
particular population to be. 

Why isn’t the variance of P s the 
same as the population variance a 2 ? 

The variance for the sampling 
distribution of proportions describes how 
the sample proportions vary. It doesn’t 
describe how the values themselves vary. 
The variance has a different value because it 
describes a different concept. 


So what use does the sampling 
distribution of proportions have? 

It allows you to work out probabilities 
for the proportion of a sample taken from a 
known population. It gives you an idea of 
what you can expect a sample to be like. 

What does the standard error of 
proportion really mean? 

The standard error is the square root 
of the variance for the sampling distribution. 
In effect, it tells you how far away you can 
expect the sample proportion to be from the 
true value of the population proportion. This 
means it tells you what sort of error you can 
expect to have. 


The sampling distribution of proportions is 

what you get if you consider all possible samples 
of size n taken from the same population and form 
a distribution out of their proportions. We use P s to 
represent the sample proportion random variable. 

The expectation and variance of P s are defined as 

E(P S ) = P 

Var(P) = pq/n 

s 

where p is the population proportion. 


■ The standard error of proportion is the standard 
deviation of this distribution. It's given by 

V\/ar(Pj 

■ lfn>30, then P s follows a normal distribution, so 

P s ~ N(p ， pq/n) 

for large n. When working with this, you need to 
apply a continuity correction of 

±1 

2n 
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25% of the gumball population are red. What’s the probability that in a box of 100 gumballs, at 
least 40% will be red? We’ll guide you through the steps. 


1. If P is the proportion of red gumballs in the box, how is P distributed? 

s s 


2. What’s the value of P(P s > 0.4)? 


(inrt: Rcrwcrwbcv* "to apply d 
匕 。灼七 ’muiiy dov-v-c^iioh. 
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25% of the gumball population are red. What’s the probability that in a box of 100 gumballs, at 
least 40% will be red? We’ll guide you through the steps. 


1. If P is the proportion of red gumballs in the box, how is P distributed? 

s s 

Lets use f *to \rcf\rcsc^*t p\robabili*ty B gumball is \rcd- o*t^C\r y/ovds, p — 0.2 • 弓 . 

Lets use P s *to \rcf\rcsc^*t p\ropo\rtioir\ o( gumballs \y\ bo 乂 \rcd- 

P s ^ M(\>, ^<\/ y\), y/hcv-c p — 0 . 1 ^, — 0 . 1 ^, aY\d y\ — lOO. As is c«\ual *to O.Z^ x. 0 . 7 ^ / 100 — O.OOl^l^, 

-this aives us 

p ， hi(an 0.00 郎） 

2. What’s the value of P(P s > 0.4)? Hint: Remember that you need to apply a continuity correction. 


P(P s > O . 午）二 P(P s > O. 午 - l/(Z >c 100)) 
- P(P s > O 炸 ) 


As P s ^ ^1(0.2^, O y/c Y\ttd *to -fmd s*td^(Ja\rdl sdo\rc o( 03^ so wc look up \rcsul*t m 

p\robabili*ty tables. This jives us 

2 .=! O . 外 - O.Z^ 

V 0.001^ 

二讲 


p(z > 2 ； 二丨一 P(z < 3 •狗 

-I - arm 

二 o.ooo^ 


I 的 o*t^c\r y/o\rds, probability -that m d \)o% o( IOO jumbalIs^ a*t leas 七午 0% will be red, is 0.000 午 . 



A probability of 
q [ 0.0004? Forget it. I'm 

getting popcorn instead. 
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- ^atnpllng Dlsfrl^utian 4 Praparfians Up Cl^se 

The sampling distribution of proportions is the distribution formed by 
taking the proportions of all possible samples of size n. The proportion 

of successes in a sample is represented by P g , and it is is distributed as 




Var(P s ) = pq 


When n is large, say bigger than 30, the distribution of P g becomes 
approximately normal, so 



Knowing the probability distribution of P g is useful because it means that 


二？ 



given a particular population, we can calculate probabilities for the proportion 


of successes in the sample. We can approximate this with the normal 
distribution, and the larger the size of the sample, the more accurate the 
approximation. 


The sampling distribution continuity correction 

When you use the normal distribution in this way, it’s important to apply 
a continuity correction. This is because the number of successes in the 
sample is discrete, and it’s used in the calculation of proportion. 

If X represents the number of successes in the sample, then P § = X/n. The 
continuity correction for X is ±(1/2)，so this means the continuity correction is 
given by 


Continuity correction = ±1 


2n 


In other words, if you use the normal distribution to approximate 
probabilities for the sampling proportion, make sure you apply a continuity 
correction of ±l/2n. 
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distribution of sample means 


How many gumballs? 

Using the sampling distribution of proportions, you’ve successfully 
managed to find the probability of getting a certain proportion of 
successes in one particular sample. This means that you can now use 
samples to predict what the population will be like, and also use your 
knowledge of the population to make predictions about samples. 


Tm impressed. Really 
impressed. Now, there's 
just one more thing that 
needs sorting out... 


O o 


There's just one wore problem. 


The Mighty Gumball GEO has one more problem for you to work on. In 
addition to selling jumbo boxes, gumballs are also sold in handy pocket- 
sized packets that you can carry with you wherever you go. 

According to Mighty Gumball’s statistics for the population, the mean 
number of gumballs in each packet is 10, and the variance is 1. The 
trouble is they’ve had a complaint. One of their most faithful customers 
bought 30 packets of gumballs, and he found that the average number of 
gumballs per packet in his sample is only 8.5. 

The GEO is concerned that he will lose one of his best customers, and he 
wants to offer him some form of compensation. The trouble is, he doesn’t 
want to compensate all of his customers. He needs to know what the 
probability is of this happening again. 






What do you need to know in order to solve this sort of problem? 
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Wc need probabilities for the sample mean 

This is a slightly different problem from last time. We’re told what the 
population mean and variance are for the packets of gumballs, and 
we have a sample of packets we need to figure out probabilities for. 

Instead of working out probabilities for the sample proportion, this 
time we need to work out probabilities for the sample mean. 



/ TV populatioh ih this Case 
all packets ^u^balls. 


is 


TV^C sa—C Coasts 


Before we can work out probabilities for the sample mean, we need to 
figure out its probability distribution. Here’s what we need to do: 



Look at all possible samples the same size as the one were 
considering. 

If we have a sample of size we need to consider all possible samples of 
size n. There are 30 packets of gumballs, so in this case, n is 30. 



Look at the distribution formed by all the samples, and 
find the expectation and variance for the sample mean. 

Every sample is different, and the number of gumballs in each packet 
varies. 



Once we know how the sample mean is distributed, use it to 
find probabilities. 

If we know how the means of all possible samples are distributed, we can 
use it to find probabilities for the mean in a random sample, in this case 
the, packets of gumballs. 


Let’s take a look at how to tackle this. 
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The sampling distribution of the mean 


So how do we find the distribution of the sample means? 


Let’s start with the population of gumball packets. We’ve been told what 
the mean and variance is for the population, and we’ll represent these with 

[i and a 2 . We can represent the number of gumballs in a packet with X. 


Each packet chosen at random is an independent observation of X ， 
so each gumball packet follows the same distribution. In other words, if 

X. represents the number of gumballs in a packet chosen at random, then 

each X. has an expectation of and variance a 2 . 



?r a 

md ? a6 ^ 


s 


E(X) = |i 
Var(X) = a 2 




E%} = u 
Var(X.) = a 2 


Now let’s take a sample of n gumball packets. We can label the number of 
gumballs in packet X 1 through X n . Each X. is an independent observation 
of X, which means that they follow the same distribution. Each X. has an 


TV ywaJdcc 
七 he sawc d'isbr'«W*bioy\. 


expectation of ja and variance a 



We can represent the mean of gumballs in these n packets of gumballs with 
X. The value of X depends on how many gumballs are in each packet of 
the n pockets, and to calculate it, you add up the total number of gumballs 
and divide by n. 






Uc\s X ,s ^ 

obscvvat»o^ of so 
? a6kc*t V^as same 

av^d . 

-fov *bV\C ok jumbal s- 


E ( x i) = M E(X n ) = |I 

VartX^ = a 2 Var(X n ) = a 2 


二 ㈣ 口 、: 

tV>c 


X = X. + X 


2 


X 
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There are many possible samples we could have taken of size n. Each 
possible sample comprises n packets, which means that each sample 
comprises n independent observations of X. The number of gumballs 
in each randomly chosen packet follows the same distribution as all the 
others, and we calculate the mean number of gumballs for each sample in 
the same way. 



Sample mean X 


Samples oi X 



Sample mean X 


We can form a distribution out of all the sample means from all possible 
samples. This is called the sampling distribution of means, or the 
distribution of X. 



sa 冰 

於 just 

oM. 



Sample mean X 

TWis \s i\\t wca 於 灼 uwbcv 

七 Wis sa—c 


So does this really help 
us? What does it give us? 



The sampling distribution of means gives us a way of 
calculating probabilities for the mean of a sample. 

Before you can work out the probability of any variable, you need to 
know about its probability distribution, and this means that if you want 
to calculate probabilities for the sample mean, you need to know how the 
sample means are distributed. In our particular case, we want to know 
what the probability is of there being a mean of 8.5 gumballs or fewer in a 
sample of 30 packets of gumballs. 

Just as with the sampling distribution of proportions, before we can start 
calculating probabilities, we need to know what the expectation and 
variance are of the distribution. 
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Find the expectation for X 

So far, we’ve looked at how to construct the sampling distribution of 
means. In other words, we consider all possible samples of size n and 
form a distribution out of their means. 

Before we can use it to find probabilities, we need to find the 
expectation and variance of X. Let’s start by finding E(X). 

Now X is the mean number of gumballs in each packet of gumballs in 
our sample. In other words, 

X = X 1 + X + ... + X 
n 

where each X. represents the number of gumballs in the i’th packet of 
gumballs. We can use this to help us find E(X). 


E(X) = E 


X 1 +X Q + ...+ X 

1 I n 


n 





TViCSC two c%p\rCSSioir\S 

七 he same, jus-b y/ri 七七⑼ m a 
d'»-f-fcv-cr\*b 


= ei-x 1+ -x 2 + ... + -x 


.TW、S tomes W 


E -x, + E _ X 


n 


n 


2 


\ m*to v\ sc\>av-a*tc 

…+七')广眾 




n 


E( XJ + E(X 2 ) + … + E(XJ ) 


This means that if we know what the expectation is for each X., we’ll have 
an expression for E(X). 

Now each X. is an independent observation of X, and we already know 

that E(X) = ja. This means that we can substitute ja for each E(X.) in the 
above expression. 

So where does this get us? 
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Let’s replace each E(X.) with ja. 


E(X) = —(ja + |j + …+ jj, 


n 


TKc o-f )< }a- 

仨 (X) 二 } a k cvc^Y 




lr~ 


TV^cv*c 5v*c 七 


=(1 

This means that E(X) = [i. In other words, the average of all the possible 
sample means of size n is the mean of the population they’re taken from. 
You’re, in effect, finding the mean of all possible means. 


This is actually quite intuitive. It means that overall, you’d expect the average 
number of gumballs per packet in a sample to be the same as the average 
number of gumballs per packet in the population. In our situation, the mean 
number of gumballs in each packet in the population is 10, so this is what 
we’d expect for the sample too. 



l-f fo\>iAla*tioir\ w>cair\ IS \0 JumbalU 
you et sample 
w>edy\ *to be \0 jumbal Is fev- f adket "too. 



What else do we need to know in order to find probabilities for the sample 
mean? How do you think we can find this? 
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What about the the variance of X? 


So far we know what E(X) is, but before we can figure out any 
probabilities for the sample mean, we need to know what Var(X) is. 
This will bring us one step closer to finding out what the distribution 
of X is like. 


O 


The distribution of X is different from the 
distribution of X. 

X represents the number of gumballs in a packet. We’ve been 
told what the mean number of gumballs in a packet is, and 
we’ve also been given the variance. 





numkev- 

dy\d 七 vav-'iawtc I 



X represents the mean number of gumballs in a sample, so the 
distribution of X represents how the means of all possible samples are 
distributed. E(X) refers to the mean value of the sample means, and 
Var(X) refers to how they vary. 



Finding Var(X) is actually a very similar process to finding E(X). 
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Statistics Magnets 


Here are some equations for finding an expression for the variance of the sample mean. 
Unfortunately, some parts of the equations have fallen off. Your task is to fill in the blanks 
below by putting the magnets back in the right positions and derive the variance of the 
sample mean. 


Var(X) = Var 


X 1 + X 2 + ... + X n 


ttmt: Look bade ai how we 
不 ouhd H ^hi help 

you. 


n 


Var 


Var 


+ Var 


...+ Var 


(Var( XJ + Var(X 2 ) + … + Var(X n )) 
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Statistics Magnets Solution 

Here are some equations for finding the variance of the sample mean. Unfortunately, some 
parts of the equations have fallen off. Your task is to put the magnets back in the right 
positions and derive the variance of the sample mean. 





(Var( XJ + Var(X 2 ) + ... + Var(X n )) 



n x 1 a 2 

k^2 



I do^c ^ ^°^ WlS / d L ^ 
sample — 认肅 7. 


Well Aov\t 
dcc\ 

of/ 



Most exam boards won’t ask 


Don’t worry if yo 
didn’t complete 
this exercise; it’s 
hard. 


u 


you to derive this, and in real 
life, you’ll just need to remember the result. 
We’re just showing you where it comes from. 
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X Distribution 

TV^c lavyv- y\ yts, *tV\c 
v A 加 alUc sb^davd 
La\ry Y\ - % cv-vor kcdomcs. 


Small h 'Tr 


X 


--- > 

gumballs per packet 


^atnpllng VW°bn^m rfflie JV[eans Up C^se 


Let’s take a closer look at the sampling distribution of means. 

We started off by looking at the distribution of a population X. The mean of X 

• • n c\ 

is given by [i, and the variance by o , so E(X) = [i and Var(X) = a . 

We then took all possible samples of size n taken from the population X and 
formed a distribution out of all the sample means, the distribution of X. The 
mean and variance of this distribution are given by: 

E(X) = |i 
Var(X) = a 2 



The standard deviation of X is the square root of the variance. The standard 
deviation tells you how far away from ja the sample mean is likely to be, so it’s 
known as the standard error of the mean. 

Standard error of the mean = a 

VrT 

The standard error of the mean gets smaller as n gets larger. This means that 
the more items there are in your sample, the more reliable your sample mean 
becomes as an estimator for the population mean. 


A - 

Aouo)nba± 


you are here ► 


479 









distribution ofx 


So how is X distributed? 


So far we’ve found what the expectation and variance is for X. Before 
we can find probabilities, though, we need to know exactly how X is 
distributed. 

Let’s start by looking at the distribution of X if X is normal. 

Here’s a sketch of the distribution for X for different values of a 2 , and n, 
where X is normally distributed. What do you notice? 



For each of these combinations, the distribution of X is normal. In other 
words 

TV^csc 

If X ^ N(|i, a 2 ), then X ~ N(|i, a 2 /n) cavV»cv 


av-c a^d 

uo s^y , 仏 at 从 


But is the number of 
gumboils in a packet 
distributed normally? 
What if ifs not? 


Q 


X might not follow a normal distribution. 

We need to know how X is distributed so that we can 
work out probabilities for the sample mean. The trouble 
is, we don’t know how X is distributed. 

We need to know what distribution X follows if X isn’t 
normal. 
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If w is large, X caw still be approximated by the normal distribution 

As n gets larger, X gets closer and closer to a normal distribution. We’ve 
already seen that X is normal if X is normal. If X isn’t normal, then we can 
still use the normal distribution to approximate the distribution of X if /z is 
sufficiently large. 


In our current situation, we know what the mean and variance are for the 
population, but we don’t know what its distribution is. However, since our 
sample size is 30, this doesn’t matter. We can still use the normal distribution 
to find probabilities for X. 

This is called the central limit theorem. 


Introducing the Central Limit Theorem 

The central limit theorem says that if you take a sample from a non-normal 
population X, and if the size of the sample is large, then the distribution of 
X is approximately normal. If the mean and variance of the population are 

[i and a 2 , and n is , say, over 30, then 


一 TK»s ^ 

X ~ N(|i ， a 2 In) 


Does this look familiar? It’s the same distribution that we had when X 
followed a normal distribution. The only difference is that if X is normal, 
it doesn’t matter what size sample you use. 


By tke central limit tkeorem, ii your sample ol X is 
large，tken X’s clistritution is approximately normal. 
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Using the central limit theorem 

So how does the central limit theorem work in practice? Let’s take a look. 

The binomial distribution 

Imagine you have a population represented by X 〜 B(n, p) where n is greater 
than 30. As we’ve seen before, [i = np, and a 2 = npq. 

The central limit theorem tells us that in this situation, X 〜 N(p, o 2 /n). To find 
the distribution of X, we substitute in the values for the population. This means 

that if we substitute in values for ja = np and a 2 = npq, we get 

por 払 c k'mom'idl disW>uW, 办 c 

y\p, and 

distrik'A'tio^, v/c yt ~ N 吁，？气 



i\\t Ration »S 

substvWx 七 Wis m*to 七 


The Poisson distribution 


Now suppose you have a population that follows a Poisson distribution of 
X 〜 Po( 入 )， again where n is greater than 30. For the Poisson distribution, 

= a 2 = X. 

As before, we can use the normal distribution to help us find probabilities for X. 
If we substitute population parameters into X 〜 N(jj, a 2 /n), we get: 


X 〜 H(k, Xln) ’ 


pov Poisson disViWt'o^ i\\t a^d 
a\rc botVi h l-f sulpstitutc tWis *m*to tVic 


In general, you take the distribution X 〜 N(jj, a 2 /n) and substitute in values for 
[i and a 2 . 


Finding probabilities 

Since X follows a normal distribution, this means that you can use standard 
normal probability tables to look up probabilities. In other words, you can 
find probabilities in exactly the same way you would for any other normal 
distribution. 
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Let’s apply this to Mighty Gumball’s problem. 

The mean number of gumballs per packet is 10, and the variance is 1. If you take a sample of 30 
packets, what’s the probability that the sample mean is 8.5 gumballs per packet or fewer? We’ll 
guide you through the steps. 


1 ■ What’s the distribution of X? 


2. What’s the value of P(X < 8.5)? 
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Let’s apply this to Mighty Gumball’s problem. 

The mean number of gumballs per packet is 10, and the variance is 1. If you take a sample of 30 
packets, what’s the probability that the sample mean is 8.5 gumballs per packet or fewer? We’ll 
guide you through the steps. 

1. What’s the distribution of X? 

Wt know that >< 10 O z /y\), )A 10, a z ==■ I, ay\d Y) =■ io, ar\d l/ZO — 0.0 努孓 So jives us 

>< - N(IO, O.Qlll) 


2. What’s the value of P(X < 8.5)? 

As) (- rno, O.OZIIX wc Y\ccd *to -f'md -the s*ta^da\rd store o( 召弓 so 七 v/c 乙扣 look up -the \resul 七 m 
p\robabili*ty tables. This ^ives us 

z _ 二召.弓一 |0 

抓 Olll 

— 一分 .ZZ (*to Z dedimal places) 

p(z < zj 二 ?cz< m) 


This p\robabili*ty is so small i*t doesn't appcav- oy\ probability -tables. iVc 匕 an assume Br\ cvc^*t v/i*th a 
p\robabili*ty *this small will hav-dly cvcv- happen. 
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Do I need to use any continuity 
corrections with the central limit 
theorem? 

Good question, but no you don’t. 

You use the central limit theorem to find 
probabilities associated with the sample 
mean rather than the values in the sample, 
which means you don’t need to make any 
sort of continuity correction. 

Is there a relationship between 
point estimators and sampling 
distributions? 

Yes, there is. 

Let’s start with the mean. The point estimator 
for the population mean is x, which means 
that \i = x. Now, if we look at the expectation 
for the sampling distribution of means, we 
get E(X) = |j. The expectation of all the 
sample means is given by |j, and we can 
estimate p with the sample mean. 


BULLET POINTS 

■ 


Similarly, the point estimator for the 
population proportion is p s , the sample 
proportion, which means that p = p s . If 
we take the expectation of all the sample 
proportions, we get E(P s ) = p. The 
expectation of all the sample proportions is 
given by p, and we can estimate p with the 
sample proportion. 

We’re not going to prove it, but we get a 
similar result for the variance. We have 
a 2 = s 2 , and E(S 2 ) = a 2 . 

So is that a coincidence? 

No, it’s not. The estimators are chosen 
so that the expectation of a large number 
of samples, all of size n and taken in the 
same way, is equal to the true value of 
the population parameter. We call these 
estimators unbiased \i Ms holds true. 

An unbiased estimator is likely to be 
accurate because on average across all 
possible samples, it’s expected to be the 
value of the true population parameter. 


How does standard error come into 

this? 

The best unbiased estimator for a 
population parameter is generally one with 
the smallest variance. In other words, it’s the 
one with the smallest standard error. 


The sampling distribution of means is what 
you get if you consider all possible samples of 
size n taken from the same population and form 
a distribution out of their means. We use X to 
represent the sample mean random variable. 

The expectation and variance of X are defined as 

E(X) = [J 
Var(X) = a 2 /n 

where p and a 2 are the mean and variance of the 
population. 


■ The standard error of the mean is the standard 
deviation of this distribution. It's given by 

VVar(X) 

■ If X~ N([j, a 2 ), then X~ N([j, a 2 /n). 

■ The central limit theorem says that if n is large 
and X doesn’t follow a normal distribution, then 

X ~ N(|j, a 2 /n) 
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hooray for gumball sampling 


Sampling saves the day! 


The work you've done is awesome! My top customer 
found an average of 8.5 gumballs in a sample of 30 
packets, and youve told me the probability of getting 
that result is extremely unlikely. That means I 
don’t have to worry about compensating disgruntled 
customers, which means more money for me! 



YouVe made a lot of progress 

Not only have you been able to come up with point 
estimators for population parameters based on a single 
sample, you’ve also been able to use population to calculate 
probabilities in the sample. That’s pretty powerful. 
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Guessing with Confidence 





I put this in the 
oven for 2.5 hours, but 
if you bake yours for 
1—5 hours, you should 
be fine. 


Sometimes samples don’t give quite the right result. 

You’ve seen how you can use point estimators to estimate the precise value of the 
population mean, variance, or proportion, but the trouble is, how can you be certain that 
your estimate is completely accurate? After all, your assumptions about the population 
rely on just one sample, and what if your sample’s off? In this chapter, you’ll see another 
way of estimating population statistics, one that allows for uncertainty. Pick up your 
probability tables, and we’ll show you the ins and outs of confidence intervals. 


this is a new chapter 



another flavor favor 


Mighty frumball is m trouble 


The Mighty Gumball CEO has gone ahead with a range of 
television advertisements, and he’s proudly announced exactly 
how long the flavor of the super-long-lasting gumballs lasts for, 
right down to the last second. 

Unfortunately... 


We re in trouble. Someone 
else has conducted 
independent tests and come 
up with a different result. 
They're threatening to sue, 
and that will cost me money. 


Mighty Gumball used a sample of 100 gumballs to come up with a 
point estimator of 62.7 minutes for the mean flavor duration, and 
25 minutes for the population variance. The GEO announced on 
primetime television that gumball flavor lasts for an average of 62.7 
minutes. It’s the best estimate for flavor duration that could possibly 
have been made based on the evidence, but what if it gave a slightly 
wrong result? 

If Mighty Gumball is sued because of the accuracy of their claims, 
they could lose a lot of money and a lot of business. They need your 
help to get them out of this. 

They need you to save them 






What do you think could have gone wrong? Should Mighty Gumball 
have used the precise value of the point estimator in their advertising? 
Why? Why not? 
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The problem with precision 

As you saw in the last chapter, point estimators are the best estimate 
we can possibly give for population statistics. You take a representative 
sample of data and use this to estimate key statistics of the population 
such as the mean, variance, and proportion. This means that the point 
estimator for the mean flavor duration of super-long-lasting gumballs 
was the best possible estimate we could possibly give. 

The problem with deriving point estimators is that we rely on the 
results of a single sample to give us a very precise estimate. We’ve 
looked at ways of making the sample as representative as possible by 
making sure the sample is unbiased, but we don’t know with absolute 
certainty that it’s 100% representative, purely because we’re dealing 
with a sample. 


Now hold it right there! 
Are you saying that point 
estimators are no good? 
After all that hard work? 


Point estimators are valuable, but they may give 
slight errors. 

Because we’re not dealing with the entire population, all we’re doing 
is giving a best estimate. If the sample we use is unbiased, then the 
estimate is likely to be close to the true value of the population. The 
question is, how close is close enough? 

Rather than give a precise value as an estimate for the population 
mean, there’s another approach we can take instead. We can specify 
some interval as an estimation of flavor duration rather than a very 
precise length of time. As an example, we could say that we expect 
gumball flavor to last for between 55 and 65 minutes. This still gives 
the impression that flavor lasts for approximately one hour, but it 
allows for some margin of error. 



The question is, how do we come up with the interval? It all depends 
how confident you want to be in the results... 
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Introducmg confidence intervals 


Up until now, we’ve estimated the mean amount of time that gumball flavor 
lasts for by using a point estimator, based upon a sample of data. Using the 
point estimator, we’ve been able to give a very precise estimate for the mean 
duration of the flavor. Here’s a sketch showing the distribution of flavor 
duration in the sample of gumballs. 



So what happens if we specify an interval for the population mean instead? 
Rather than specify an exact value, we can specify two values we expect 
flavor duration to lie between. We place our point estimator for the mean 
in the center of the interval and set the interval limits to this value plus or 
minus some margin of error. 





The interval limits are chosen so that there’s a specified probability of the 
population mean being between a and b. As an example, you may want to 
choose a and b so that there’s a 95% chance of the interval containing the 
population mean. In other words, you choose a and b so that 


P(a < |^ < b) = 0.95 


We represent this interval as (a, b). As the exact value of a and b depends 
on the degree of confidence you want to have that the interval contains the 
population mean, (a, b) is called a confidence interval. 

So how do we find the confidence interval for the population mean? 
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Four steps for finding confidence intervals 


Here are the broad steps involved in finding confidence intervals. Don’t 
worry if you don’t get what each step is about just yet, we’ll go through 


them in more detail soon. 


samplihg distHbutiohS ih 

the last dhaptev-. 




o 

o 

o 

o 


Choose your population statistic 
Find its sampling distribution 
Decide on the level of confidence 
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tY\tt 


； iOh. 


Let’s see if we can construct a confidence interval for the Mighty Gumball 
CEO that he can use in his television commercials. Let’s find a confidence 
interval for the mean amount of time gumball flavor lasts for. 


tliereiare no o 

Dumb Questi 9 ns 


So can you construct a confidence 
interval for any population statistic? 

Broadly speaking, you can construct 
a confidence interval for any population 
statistic where you know what the sampling 
distribution is like. We've looked at sampling 
distributions for the mean and proportion, so 
we can construct confidence intervals for 
both of these. 


What about the variance? Can we 
construct a confidence interval for that? 

Theoretically, yes, but we haven’t 
covered the sampling distribution for the 
variance, and we’re not going to. It's more 
common to construct confidence intervals 
for the mean and proportion, and these are 
what tend to be covered by statistics exams. 

Do these steps relate to the 
confidence interval for the mean or the 
confidence interval for the proportion? 

They’re general steps that apply to 
either. You can use them for the population 
mean and for the population proportion. 


Does it matter how the population 
is distributed? 

The key thing is the sampling 
distribution of the statistic you’re trying to 
construct a confidence interval for. If you 
want a confidence interval for the mean, you 
need to know the sampling distribution of 
means, and if you want a confidence interval 
for the proportion, you need to know the 
sampling distribution of proportions. 

The main impact the population distribution 
has on the confidence interval is the effect it 
has on the sampling distribution. We’ll see 
how later on. 


you are here ► 


491 





constructing confidence intervals: step by step 


Step 1: Choose your population statistic 

The first step is to pick the statistic you want to construct a confidence 
interval for. This all depends on the problem you want to solve. 

We want to construct a confidence interval for the mean amount of 
time that gumball flavor lasts for, so in this case, we want to construct a 
confidence interval for the population mean, ja. 


Now that we’ve chosen the population statistic, we can move onto the 
next step. 

Step Z: Find its sampling distribution 

To find a confidence interval for the population mean, we need to know 
what the sampling distribution is for the mean. In other words, we 
need to know what the expectation and variance of X is, and also what 
distribution it follows. 

Let’s start with the expectation and variance. If we go back to the work 
we did in the last chapter, then the sampling distribution of means has 
the following expectation and variance: 

E(X) = Var(X) = a 2 

n 


In order to use this to find the confidence interval for p，we substitute in 
values for the population variance, a 2 , and the sample size, n. 

广 -- 、 

(But what about (j? Why 

f don’t we substitute in a 

value for |j? 


We don’t substitute in a value for |i as this is what 
we 5 re trying to find a confidence interval for. 

We’re using the sampling distribution to help us find a confidence interval 
for so this means that we substitute in values for everything except for ja. 

By substituting in the values for a 2 and n, we can use the distribution of X 
to help us find the confidence interval. We’ll show you how really soon. 


There’s just one problem. We don’t know what the true value of a 2 is. All 
we have to go on is estimates based on the sample. 
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Point estimators to the rescue 


So what can we use as 


the value for a 


2 ? 


Even though we don’t know what the true value is for the population 
variance, a 2 , we can estimate its value using its point estimator. Rather 
than use a 2 , we can use G 2 in its place, or s 2 . 


This means that the expectation and variance for the sampling 
distribution of means is 


E(X) 




Var(X) 


s 


n 


丄 l - 4- U -tV^c vana^c. iA/c 


Mighty Gumball used a sample of 100 gumballs to come up with their 
estimates, and they have calculated that s 2 = 25. This means that 

Var(X) = s 2 
n 

= 25 
100 
= 0.25 

There’s still one thing we have left to do. Before we can find the 
confidence interval for p，we need to know exactly how X is distributed. 


Assume that X 〜 N(|a, a 2 ) and that the number in the sample is 
large. What distribution does X follow? Use E(X) and Var(X) above 
to help you. 
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sharpen solution 


(^Jterpen your pencil 

Solution 


Assume that X 〜 N(|j, a 2 ) and that the number in the sample is 
large. What distribution does X follow? Use E(X) and Var(X) above 
to help you. 


|-f )( -follov/s a normal dis*t\ribu*tio^ )( does -boo- Subs*ti*tu*tm3 m po’m 七 cs*tir»\a*bo\r -fov O 1 , y/c 

又〜 s %) 


o\r: 


>< - hlCfA, O.Z 幻 


WeVc fouwd the distribution for X 

Now that we know how X is distributed, we have enough information to 
move onto the next step. 


Step 3: Pecide on the level of confidence 

The level of confidence lets you say how sure you want to be that the 
confidence interval contains your population statistic. As an example, 
suppose we want a confidence level of 95% for the population mean. 
This means that the probability of the population mean being inside the 
confidence interval is 0.95. 



oUk ? 岬十 level 





How do you think the level of confidence affects the size of the confidence 
interval? 
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How to select an appropriate confidence level 


So who decides what the level of confidence should be? What’s the right 
level of confidence? 


The answer to this really depends on your situation and how confident 
you need to be that your interval contains the population statistic. A 95% 
confidence level is common, but sometimes you might want a different 
one, such as 90% or 99%. As an example, the Mighty Gumball CEO 
might want to have a higher degree of confidence that the population 
mean falls inside the confidence interval, as he intends to use it in his 
television advertisements. 

The key thing to remember is that the higher the confidence level is, 
the wider the interval becomes, and the more chance there is of the 
confidence interval containing the population statistic. 


Well, why don’t we just make the 
confidence interval really wide? 
That way were bound to include 
the population statistic. 


The trouble with making the confidence interval 
too wide is that it can lose meaning. 

As an extreme example, we could say that that the mean duration of 
gumball flavor is between 0 minutes and 3 days. While this is true, it 
doesn’t give you an idea how long gumball flavor really lasts for. You 
don’t know whether it lasts for seconds, minutes, or hours. 

The key thing is to make the interval as narrow as possible, but 
wide enough so you can be reasonably sure the true mean is in the 
interval. 

Let’s use a 95% confidence level for Mighty Gumball. That way, 
there’ll be a high probability that it contains the population mean. 

Now that we have the confidence level, we can move onto the final step: 
finding the confidence limits. 
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Step 4: Fiwd the confidence limits 

The final step is to find a and b, the limits of the confidence interval, 
which indicate the left and right borders of the range in which there’s 
a 95 % probability of the mean falling. The exact value of a and b 
depends on the sampling distribution we need to use and the level of 
confidence that we need to have. 

For this problem, we need to find the 95% confidence level for the mean 
duration of gumball flavor. Meaning, there must be a .95 probability 
that ja lies between the a and b that we find. We also know that X 



We can find the values of a and b using the distribution of X. In other 
words, we can use the distribution of X 〜 N(|i ， 0.25) to find a and b, such 
that P(X < a) = 0.025 and P(X > b) = 0.025. 


So does that mean we use 
the normal distribution to 
find the confidence interval 
for (j? 


As X follows a normal distribution, we can use the 
normal distribution to find the confidence interval. 

We can do this in a similar way to how we’ve solved other problems in 
the past. We calculate a standard score, and we use standard normal 
probability tables to help us get the result we need. 
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Start by finding Z 

Before we can use normal probability tables, we need to standardize X. 
We know that X 〜 N(jj ， 0.25)，so this means that if we standardize, we get 

Z = X - ja where Z 〜 N(0 ， 1) 

V025 


Here’s a sketch of the standardized version of the confidence interval. 



O.OVS 


z 


a 


o 


z 


b 


We need to find and z b where P(z a < Z < zj = 0.95. In other words 
the standardized confidence limits are given by z a and z b where 
P(Z < zj = 0.025 and P(Z > zj = 0.025. We can find the values of 
and z b using probability tables. 


We need to find z a and z b , such that P(z a < Z < z b ) 
1. Use probability tables to find the value of z a where P(Z < z a ) = 0.025. 


(f^|l^rpen your pencil 


2. Use probability tables to find the value of z b where P(Z > z b ) = 0.025. 


= 0.95. 
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^harpen your pencil 

Sobtion 


We need to find z a and z b such that P(z a < Z < z b ) = 0.95. 


1. Use probability tables to find the value of z a where P(Z < z a ) = 0.025. 

|-f y/c look up p\robabili*ty O-OTS? 'm s*tdhdd\rd ho\rmdl p\robabili*ty tables, -this jives us Zg — 一 I 乃么 

2. Use probability tables to find the value of z b where P(Z > z b ) = 0.025. 

To -fmd wc Y\ttd *to look up d value o-f This ^ives us z^, — I 乃么 . 


Rewrite the inequality m terms of 

So far, we’ve found a standardized version of the confidence interval. 
We’ve found that P(-1.96 < Z < 1.96) = 0.95. In other words, 


P 


(-1.96 < 


X-ja 

0.5 


< 1.96 


= 0.95 



We can find the confidence interval for |i by 
rewriting the inequality in terms of |i. 

If we can rewrite 


in the form 


-1.96 < < 1.96 

0.5 


a < < b 



TWlS Y/»ll uS 
a tor\^\Atv\tC 

*m 七饮 val W \x. 


We’ll have our confidence limits for jj. 
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aa] Puzzjc 


Your job is to rewrite 

-1.96 < (X - |j)/0.5 < 1.96 and come 
up with a confidence interval for 
H.Take snippets from the pool 
and place them into the blank 
lines. You may not use the same 
snippet more than once. 


TV^'is 5»vcs *bV\c lc-f^ay>d 
side 

X-M 


X-m 

1.96 < - < 1.96 

0.5 


1.96 <■ 


0.5 


1.96 x < X - |j 


+ H < X 


M< 


Note: Each thing from 


X - 0.98 < m<X + 0.98 


This gives you the side 

X-M 


< 1.96 


0.5 


X - |j < x 0.5 


X< 


+ M 


< M 


TW«s »s Uat you ^ ^ 
Put W sides ok the 

•me— | 切 ㈣ 細 a 5 am 
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Paa] puzzjc 

Your job is to rewrite 

1.96 < (X - |j)/0.5 < 1.96 and come 
up with a confidence interval for 
H.Take snippets from the pool 
and place them into the blank 
lines. You may not use the same 
snippet more than once. 


i\\t 




X-m 

1.96 < - < 1.96 

0.5 


1.96 <• 


i4 

X-m 

0.5 


1.96 x 0.5 <X-m 


■0.98 + M < X 


M< X + 0.98 


This gives you side. 




X-m 

0.5 


< 1.96 


X-m< 1.96 x0.5 


X < 0.98 + M 


X-0.98 <m 


X-0.98 < m < X + 0.98 


Note: Each thing from 
the pool can only be 
used once! 
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Fmally, fiwd the value of X 

Now that we’ve rewritten the inequality, we’re very close to finding a 
confidence interval for [i that describes the amount of time gumball 
flavor typically lasts for. In other words, we use 


P(X- 0.98 < p < X + 0.98) = 0.95 


Here’s a quick sketch. 



Our confidence limits are given by X — 0.98 and X + 0.98. If we knew 
what to use as a value for X, we’d have values for the confidence limits. 


I wonder if we can use the 
Mighty Gumball sample in 
some way. Maybe we can use 
the mean of the sample. 


X is the distribution of sample means, so we can 
use the value of x from the Mighty Gumball sample. 


r^l^rpen your pencil 



The confidence limits are given by X - 0.98 and X + 0.98. For the 
Mighty Gumball sample, x is given by 62.7. Use this to come up 
with values for the confidence limits. 
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%ihdrpen your pencil 

Solution 


The confidence limits are given by X - 0.98 and X + 0.98. For the 
Mighty Gumball sample, x is given by 62.7. Use this to come up 
with values for the confidence limits. 


The toY\(\dtY\tt a\rc by )< - O By\A )( + O.^. |-f wc subs*ti*tu*tc m -the sample, v/C 

toY\(\dtY\tt of ^>2.1 — O ^ dhd ^>Z-7 + |y> o*thc\r v/ovds, ou\r toY\(\dtY\tt *m*tc\rval is (M H7-, • 厶 0). 


YouVg found the confidence interval 

Congratulations! You’ve found your first confidence interval. You 

found that there’s a 95% chance that the interval 

(61.72, 63.68) contains the population mean for flavor duration. 


That*s fantastic news! That 
means I can update the fine 
print on our advertisements. 
That should handle any lawsuits. 


Using confidence intervals in the television advertisement rather 
than point estimators means that the CEO can give an accurate and 
precise estimate for how long flavor lasts, but without having to give 
a precise figure. It makes allowances for any margin of error there 
might be in the sample. 
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Lefs summarize the steps 

Let’s look back at the steps we went through in order to construct the 
confidence interval. 

The first thing we did was choose the population statistic that 
we needed to construct a confidence interval for. We needed to find a 
confidence interval for the mean duration of gumball flavor, and this 
meant that we needed to construct a confidence interval for ja. 

Once we’d figured out which population we needed to construct a 
confidence interval for, we had to find its sampling distribution. 
We found the expectation and variance of the sampling distribution of 
means, substituting in values for every statistic except for ja. We then 
figured out that we could use a normal distribution for X. 

After that, we decided on the level of confidence we needed for the 
confidence interval. We decided to use a confidence level of 95%. 

Finally, we had to find the confidence limits for the confidence 
interval. We used the level of confidence and sampling distribution to 
come up with a suitable confidence interval. 


So does that mean I have to 
go through the same process 
every single time I want to 
construct a confidence interval? 


We can take some shortcuts. 

Constructing confidence intervals can be a repetitive process, so there 
are some shortcuts you can take. It all comes down to the level of 
confidence you want and the distribution of the test statistic. 

Let’s take a look at some of the shortcuts we can take. 
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Handy shortcuts for confidence intervals 

Here are some of the shortcuts you can take when you calculate confidence 
intervals. All you need to do is look at the population statistic you want to 
find, look at the distribution of the population and the conditions, and then 
slot in the population statistic or its estimator. The value c depends on the 
level of confidence 


Population 

statistic 

Population 

distribution 

Conditions 

Confidence interval 


Normal 

You know what a 2 is 
n is large or small 
x is the sample mean 

I 

(- o. o) 

X - C X + c - 

\ yfK VK) 


P 

Non-normal 

You know what a 2 is 
n is large (at least 30) 
x is the sample mean 

I 

(- O. o) 

\ vw vw) 


P 

Normal or non-normal 

You don’t know what a 2 is 
n is large (at least 30) 
x is the sample mean 
s 2 is the sample variance 

I 

f_ s_ s) 

X - C ^ X + c - 


P 

Binomial 

n is large 

p s is the sample proportion 

q s is 1 - P s 

I 


/pa\ 


Whafs the interval iw general? 

In general, the confidence interval is given by 

statistic ± (margin of error) 


The margin of error is given by the value of c multiplied 
by the standard deviation of the test statistic. 




Level of confidence 

Value of c 

90% 

1.64 

95% 

1.96 

99% 

2.58 


margin of error = c x (standard deviation of statistic) 
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Mighty Gumball took a sample of 50 gumballs and found that in the sample, the proportion of red 
gumballs is 0.25. Construct a 99% confidence interval for the proportion of red gumballs in the 
population. 
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： 4 ^ 

%OLpi\OH 


Mighty Gumball took a sample of 50 gumballs and found that in the sample, the proportion of red 
gumballs is 0.25. Construct a 99% confidence interval for the proportion of red gumballs in the 
population. 


The 'm*tcwal -fo\r population p\ropo\rtio 灼 is by 



IVc r\tt& *to -fmd -the tor\^\Atr\Ct *m*tcv-val so d — 2 •.呢 The p\ropo\rtioir\ o-f \rcd ^umbdlls is O so 
p 二 O.Vy a^d ^ — 0.1^. y\ — ^O. This ^ives us 

O Vy x. 0.7 弓、 

) 

- (o.n x. 0 . 0 m, o.z 弓十 x. omv 

- (o.n o.i^e, o.i^ + o.i^e) 

二 (0.0%, o.^-oe) 


P s -y—^ p s + ^!^) 




on z^a 


x. on^ 


Q.V^ + z.^0 
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When we found the expectation 
and variance for X earlier, why did we 
substitute in the point estimator fora 2 
and not |j? 

We didn’t substitute x for (j because 
we needed to find the confidence interval 
for p. We needed to find some sort of 
expression involving |j that we could use to 
find the confidence interval. 

Why did we use x as the value of 
X? 

The distribution of X is the sampling 
distribution of means. You form it by taking 
every possible sample of size n from the 
population, and then forming a distribution 
out of all the sample means. 

x is the particular value of the mean taken 
from our sample, so we use it to help us find 
the confidence interval. 

Q/ What’s the difference between the 
confidence interval and the confidence 
level? 

The confidence interval is the 
probability that your statistic is contained 
within the confidence interval. It's normally 
given as a percentage, for example, 95%. 
The confidence interval gives the lower and 
upper limit of the interval itself, the actual 
range of numbers. 


We’ve found that the 95% 
confidence interval for \\ is (61.72, 63.68). 
What does that really mean? 

What it means is that if you were to 
take many samples of the same size and 
construct confidence intervals for all of them, 
then 95% of your confidence intervals would 
contain the true population mean. You know 
that 95% of the time, a confidence interval 
constructed in this way will contain the 
population mean. 

In the shortcuts, do the values of c 
apply to every confidence interval? 

They apply to all of the shortcuts 
we’ve shown you so far because all of 
these shortcuts are based on the normal 
distribution. This is because the sampling 
distribution in all of these cases follows the 
normal distribution. 

I’ve sometimes seen "a” instead 
of V’ in the shortcuts for the confidence 
intervals. Is that wrong? 

Not at all. The key thing is that whether 
you refer to it as “a” or “c”，it represents 
a value that you can substitute into your 
confidence interval to give you the right 
confidence level. The values stay the same 
no matter what you call it. 


So are all confidence intervals 
based on the normal distribution? 

No, they're not. We’ll look at intervals 
based on other distributions later on. 

Why did we go through all those 
steps when all we have to do is slot 
values into the shortcuts? 

We went through the steps so that you 
could see what was going on underneath 
and understand how confidence intervals 
are constructed. Most of the time, you’ll just 
have to substitute in values. 

Do I need continuity corrections 
when I’m working with confidence 
intervals? 

Theoretically, you do, but in practice, 
they’re generally omitted. This means 
that you can just substitute values into 
the shortcuts to come up with confidence 
intervals. 
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confidence interval dilemma: part deux 


Just owe more problem... 


Mighty Gumball has one last problem for you to sort out. One of the 
candy stores selling gumballs wants to determine how much gumballs 
typically weigh, as they find that their customers often buy gumballs 
based on weight rather than quantity. If the store can figure out the 
typical weight of a gumball, they can use this information to boost sales. 


That means I need you to come up with 
a confidence interval for gumball weight, 
but as \fs just for one store, I don’t want 
to sample a large number of gumballs. 


O 


Mighty Gumball has taken a representative sample of 10 
gumballs and weighed each one. In their sample, x = 0.5 oz and 

s 2 = 0.09. 


How do we find the confidence interval? 

Step 1: Choose your population statistic 

The first step is to pick the statistic we want to construct a confidence 
interval for. We want to construct a confidence interval for the mean 
weight of gumballs, so we need to construct a confidence interval for the 
population mean, jj. 

As we need to find the confidence interval for this means that the next 
step is to find its sampling distribution, the distribution of X. 






Assuming the weight of each gumball in the population follows a normal 
distribution, how would you go about creating a 95% confidence interval for 
this data? Hint: look at the table of confidence interval shortcuts and see which 
situation we have here. 
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constructing confidence intervals 


Step 1: Fiwd its sampling distribution 

So what’s the distribution of X? 


Thafs easy. X is normal, so 
that means that X has to follow 
a normal distribution, too. 


O 


The normal distribution isn’t a good approximation 
for every situation. 

All of the sampling distributions we’ve seen so far either follow a 
normal distribution or can be approximated by it. The trouble is that 
we can’t use the normal distribution for every single confidence interval. 
Unfortunately, this situation is one of them. 


So why can，t we use the normal distribution here? 

When sample sizes are large, the normal distribution is ideal for finding 
confidence intervals. It gives accurate results, irrespective of how the 
population itself is distributed. 



Here we have a different situation. Even though X itself is distributed 
normally, X isn’t. 



The first is that we don’t know what the true variance is of the 

population, so this means we have to estimate a 2 using the 
sample data. We can easily do this using point estimators, but 
there’s a problem: the size of the sample is so small that there are 
likely to be significant errors in our estimate, much larger errors 
than if we used a larger sample of gumballs. The potential errors 
we’re dealing with mean that the normal distribution won’t give 
us accurate enough probabilities for X, which means it won’t give 
us an accurate confidence interval. 

So what sort of distribution does X follow? It actually follows a 
t-distribution. Let’s find out more. 
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introducing t-distributions 


X follows the t - distribution whew the sample is small 

The t-distribution is a probability distribution that specializes in exactly the 
sort of situation we have here. It’s the distribution that X follows where the 

population is normal, a 2 is unknown, and you only have a small sample at 
your disposal. 


The t-distribution looks like a smooth, symmetrical curve, and it’s exact 
shape depends on the size of the sample. When the sample size is large, it 
looks like the normal distribution, but when the sample size is small, the 
curve is flatter and has slightly fatter tails. It takes one parameter, v, where 
v is equal to n - 1. n is the size of the sample, and v is called the number 
of degrees of freedom. ^ 

Let ? s take a look at this. Here’s a sketch of the t-distribution for different 
values of v. Can you see how the value of v affects the shape of the 
distribution? 


look at dcgv-ccs o( 
-fv-ccdorn \Y\ rvto\re depth 

Chapter I 午 . 



Tk 七 of 仏 c 》- 

d •，如 ww 如 ― ，。“ n 
K 0 {如 sam ? lc av,a value V- 

TV\c W avc vela 七 cd. 


A shorthand way of saying that T follows the t-distribution with v 
degrees of freedom is 


T is -the tes-t s-ta-tis-tid. Youll see ho 
■feo ^al^ulatc it oh the hext 





七 (V) v/cVc usm^ 七 - d • 吵如 ^ 


The t-distribution works in a similar way to the normal distribution. We 
start off by converting the limit of the probability area into a standard 
score, and then we use probability tables to get the result we want. 

Let’s start with the standard score. 
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Find the standard score for the t - distribution 

We calculate the standard score for the t-distribution in the same way we 
did for the normal distribution. As with the the normal distribution, we 
standardize by subtracting the expectation of the sampling distribution 
and then dividing by its standard deviation. The only difference is that 
we represent the result with T instead of Z, as we’re going to use it with 
the t-distribution. 


We need to find the distribution of X, so this means we need to use the 
expectation and standard deviation of X. The expectation of X is jj, and 
the standard deviation is a/n. As we need to estimate the value of a 
with s, this means that the standard score for the t-distribution is given by 


This is the same as 

ihc ah d divide 
by 七 he stahdavd deviatioh. 


This IS population mcair\ 

x - 偏” d dor>Vidcr>tc m*tcv-val -fov. 


All we need to do is substitute in the values for X, G, and n. 


^^arpen your pencil 


Let’s see if you can apply this to the Mighty Gumball 
sample. There are 10 gumballs in the sample, where 
x = 0.5oz and s 2 = 0.09. What's the value of v and 
what’s T? 
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sharpen solution 


(^Jterpen your pencil 

Solution 


Let’s see if you can apply this to the Mighty Gumball 
sample. There are 10 gumballs in the sample, where 
x = 0.5oz and s 2 = 0.09. What's the value of v, and 
what’s T? 


Thc\rc a\rc 10 gumballs m sample, Bv\d V — ^ 
T is by 


• This -that value of V is 




又 - 




s/Vy\ 

7- 




Vo.oyio 

o.o°l^°l 


Step 3 : Pecide oh the level of confidence 

So what level of confidence should we use for Mighty Gumball? 
Remember, the level of confidence says how sure you want to be that 
the confidence interval contains the population statistic, and it helps us 
figure out how wide the confidence interval needs to be. As before, let’s 
have a confidence level of 95% for the population mean. This means 
that the probability of the population mean being inside the confidence 
interval is 0.95. 



level oJf ^ 铷 
\s 0 °i^ 


t 


Now that we have the level of confidence, we can move onto the final 
step, finding the confidence interval for jj. 
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Step 4: Fiwd the confidence limits 

You find confidence limits with the t-distribution in a similar way to 
how you find them with the normal distribution. Your confidence 
interval is given by 



S 



r Tiiis is same as we 

bc-Pov-c, just v-cplatc t y/ith t. 


where TVis is 0.^, as wc 

y/a^*b bo *f 

P(.t 彡 T £ t) - 0_95 coY\^\dtY\tt 'm*bcv-val. 


We can find the value of t using t-distribution probability tables. 




0.0 


O.OV^ 


Using t - distribution probability tables 

t-distribution probability tables give you the value of t where 
P(T > t) = p. In our case, p = 0.025. 

To find t, use the first column to look up v, and the top row to look up 
p. The place where they intersect gives the value of t. As an example, 
if we look up v = 7 and p = 0.05, we get t = 1.895. 

Once you’ve found the value of t, you can use it to find your 
confidence interval. 




0.0 弓 



1 


Tail profa 

lability p 

V 

•25 

•20 

.15 

.10 { 


•02 

.01 

.005 

•0025 

.001 

•0005 

1 

1.000 

1.376 

1.963 

3.078 


w 

12.71 

15.89 

31.82 

63.66 

127.3 

318.3 

636.6 

2 

.816 

1.061 

1.386 

1.886 

2.9 

>0 

4.303 

4.849 

6.965 

9.925 

14.09 

22.33 

31.60 

3 

.765 

.978 

1.250 

1.638 

2.C 

53 

3.182 

3.482 

4.541 

5.841 

7.453 

10.21 

12.92 

4 

.741 

.941 

1.190 

1.533 

2.1 

12 

2.776 

2.999 

3.747 

4.604 

5.598 

7.173 

8.610 

5 

.727 

.920 

1.156 

1.476 

2 .( 

5 

2.571 

2.757 

3.365 

4.032 

4.773 

5.893 

6.869 

6 

.718 

.906 

1.134 

1.440 


L 

2.447 

2.612 

3.143 

3.707 

4.317 

5.208 

5.959 

Q 





^ 1.895 

隆 2.365 

2.517 

2.998 

3.499 

4.029 

4.785 

5.408 

8 

.706 

.889 

1.108 

1.397 


r 

’ 2.306 

2.449 

2.896 

3.355 

3.833 

4.501 

5.041 

9 

.703 

.883 

1.100 

1.383 

i.ej 


2.262 

2.398 

2.821 

3.250 

3.690 

4.297 

4.781 

10 

.700 

.879 

1.093 

1.372 

1.8 

一 

2.228 

2.359 

2.764 

3.169 

3.581 

4.144 

4.587 


7 


TVis is v/V^cv"C 1 w'cct 
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confidence interval exercise 


See if you can find the 95% confidence interval for the average weight of gumballs. 
There are 10 gumballs in the sample where x = 0.5oz and s 2 = 0.09. 


1. The confidence interval for p is given by (x -1 s/Vn, x + t s/Vn). 
Use standard probability tables to find the value oft. 



2. Use this to find the confidence interval for |j. 
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constructing confidence intervals 


The t-distributiow vs. the normal distribution 



o 


So why did we use the 
t-distribution for this problem? 
Why couldn’t we have used the 
normal distribution instead? 


The t-distribution is more accurate when we 
have to estimate the population variance for 
small samples. 

The problem with basing our estimate of a 2 on just a small 
sample is that it may not accurately reflect the true value of 
the population variance. This means we need to make some 
allowance for this in our confidence interval by making the 
interval wider. 

The shape of the t-distribution varies in line with the value of 
v. As it takes the size of the sample into account, this means that 
it allows for any uncertainty we may feel about the accuracy of 

our estimate for a 2 . When n is small, the t-distribution gives a 
wider confidence interval than the normal distribution, which 
makes it more appropriate for small-sized samples. 


Handy shortcuts for confidence intervals - the t - distribution 

Here’s a quick reminder of when you need to use the t-distribution, and what the 
confidence interval is for ja. Just substitute in your values. 


Population 

statistic 

Population 

distribution 

Conditions 

Confidence interval 


Normal or non-normal 

You don’t know what a 2 is 
n is small (less than 30) 
x is the sample mean 
s 2 is the sample variance 

( 7 «) 


To find t(v), you need to look it up in t-distribution probability tables. To do this, use 
V = n - 1 and your level of confidence to find the critical region. 
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exercise solution 


Sotyiiort 


See if you can find the 95% confidence interval for the average weight of gumballs. 
There are 10 gumballs in the sample where x = 0.5oz and s 2 = 0.09. 


1. The confidence interval for p is given by (x -1 s/Vn, x + t s/Vn). 
Use standard probability tables to find the value oft. 


\0 gumballs sample, so V 二卞 iVe wa^*t *to -f md toY\(\dtY\tt 'm*tc\rval, so *tiVis 

y/c look up O OTJy *m 七一 dis*bribu*ticm probability *tablc, y/i*th °[ dcjv-ccs o-f -fvccdom. This ^ives us 七二 




2. Use this to find the confidence interval for |j. 

-fmd Coy\(\dcY\tt *m*tc\rval by subs-ti-tutmj values -fo\r s, y\ *m*to — 七 s/V ^； 孓 + 七 s/V^)- 
This jives us 

& 一七 s/v^ 孓 + 七 s/v^) - (o.^ - Vfo.oyio), + V(o.oyio)) 

- (o^ - z.z 么 z x o.o^% + 

二 (O.^ - O.ZI^ O.^ + QX\^) 

二 oni^) 
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Mighty Gumball has noticed a problem with their gumball dispensers. They have taken a sample 
of 30 machines, and found that the mean number of malfunctions is 15. Construct a 99% 
confidence interval for the number of malfunctions per month. 
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exercise solution 



Mighty Gumball has noticed a problem with their gumball dispensers. They have taken a sample 
of 30 machines, and found that the mean number of malfunctions is 15. Construct a 99% 
confidence interval for the number of malfunctions per month. 


The i^umbcv- b\redkdovms moirrth is modelled by d Poisson dis*bribu*ticm. /\s 3\rc 1>0 madh'mes ； wc 乙 

•f md toY\(\dcv\tt *m*tc\rval usrng (% — ts/Vv\) ^ + ts/VvO- 

Wt Y\ttd *to -fmd -the tor\^\Atr\tt *m*tc\rval, medics t — Z.^0. Fo\r -the poisson dis*t\ribu*tio^ *thc 

expe 匕七 3*tio 的 and vavia^^c av-c bo*th e^ual io X, so % =■ 1^ s z — 1^. 

The toY\(\dtY\tt *m*tc\rval is jivci^ by 

— ds/v^, + ds/VJ - 0^ - V ( 閉 o), i^ + V ( 閉 o)) 

二 （ I 弓 - 犯 n>o\ i^ +V ( 閉 o)) 

二 （I 弓 - % o.ioit + % onoi) 

二 （I 弓 - 1.02 • 午 , K + I 谢） 

二 (m% 陳午） 



Dumb Quest! 


9ns 


Does X follow a t-distribution? 


A- 

X follows a t-distribution when the population is normal, the 
sample size is small, and you need to estimate the population 
variance using the sample data. 


What happens to the confidence interval if the size of the 
sample, n, changes? 

Ifn decreases, then your confidence interval gets wider, and if 
n increases, your confidence interval gets narrower. 


In general, what happens to my confidence interval if the 
confidence level changes? 


If your confidence level goes down, then your confidence 
interval gets narrower. If your confidence level goes up, then your 
confidence interval gets wider. As an example, a 95% confidence 
interval will be narrower than a 99% confidence interval for the same 
set of data. 


Confidence intervals take the form 

statistic 土 margin of error 

where the margin of error is equal to c times the standard deviation 
of the statistic. 

The standard deviation of the statistic depends on the size of the 
sample, and it gets smaller as n gets larger. In other words, the 
margin of error gets smaller as n gets larger, and larger as n gets 
smaller. 


In general, a smaller sample leads to a wider confidence interval, and 
a larger sample to a narrower one. 
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YouVg found the confidence intervals! 

You’ve made a lot of progress in this chapter, and the result of it is 
that you now know two ways of estimating population statistics. 

The first way of estimating population statistics is to use point 
estimators. Point estimators give you a way of estimating the 
precise value for the population statistics. It’s the best guess you can 
possibly make based on the sample data. 


You also know how to come up with confidence intervals for 
the population statistics. Rather than come up with a very precise 
estimate for the population statistics, you now know how to find a 
range of values for the population statistic that you can feel truly 
confident about. 



You re great! I’ll tell the candy shop what the 
confidence interval is for the mean weight of 
gumballs, as thafs just what they wanted to 
know. They 11 be able to sell more gumballs to their 
customers, and that means increased profits! 
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Look At The Evidence 





Not everything you’re told is absolutely certain. 

The trouble is, how do you know when what youYe being told isn’t right? Hypothesis 
tests give you a way of using samples to test whether or not statistical claims are likely 
to be true. They give you a way of weighing the evidence and testing whether extreme 
results can be explained by mere coincidence, or whether there are darker forces at 
work. Come with us on a ride through this chapter, and we’ll show you how you can use 
hypothesis tests to confirm or allay your deepest suspicions. 


this is a new chapter 






the ultimate snoring remedy? 


Is YOUR SNORING GETTING YOU DOWN? 


Then you need new SnoreCull, 

THE ULTIMATE REMEDY FOR SNORING, 

SnoreCull cures 90% 

OF SNORERS WITHIN 2 WEEKS, 



Statsville's new miracle drug 

Statsville’s leading drug company has produced a new remedy for 
curing snoring. Frustrated snorers are flocking to their doctors in 
hopes of finding nightly relief. 

The drug company claims that their miracle drug cures 90% of 
people within two weeks, which is great news for the people with 
snoring difficulties. The trouble is, not everyone’s convinced. 
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rm not sure these claims are 
true. If they were, more of 
my patients would be cured. 


The doctor at the Statsville Surgery has been prescribing Snore Gull 
to her patients, but she’s disappointed by the results. She decides to 
conduct her own trial of the drug. 

She takes a random sample of 15 snorers and puts them on a course 
of SnoreGull for two weeks. After two weeks, she calls them back in 
to see whether their snoring has stopped. 

Here are the results: 



Cured? 

Yes 

No 

Frequency 

11 

4 




or 


\,as ^ 



If the drug cures 90% of people, how many people in the sample 
of 15 snorers would you expect to have been cured? What sort of 
distribution do you think this follows? 
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"This is how mdhy people v/ev^e 
ad*tually duved by ShovcCull. 


\ 


/ 


r 二 = ⑽ 

W (M 咖 


sharpen solution 


(^Jterpen your pencil 

Solution 


If the drug cures 90% of people, how many people in the sample 
of 15 snorers would you expect to have been cured? How does 
this compare with the doctors results? What sort of distribution 
do you think this follows? 


^0% <^f 1^ is so you’d expe 匕七 I 午 people *to be duv-cd. 0v\\y II people m dod*to\rs sample v/c\rc du\rcd, y/hidh is 
mudh lov/C\r \rcsul*t you’d 

Thc\rc d\re 3 spedi-f id ir\umbc\r o( *briaU dod*to\r is *m*tc\rcs*tcd m 的 umbe\r o*f sud^csscs, so ^umbev- o-f 

suddcsscs -follov/s a b'momial dis*tvibu*tio^. |-f )< is -the hurwbev suddcsscs )( ^ O 乃 ). 


So whafs the problem? 

Here’s the probability distribution for how many people the drug 
company says should have been cured by the snoring remedy. 


10 11 12 13 14 15 x 


The number of people cured by SnoreGull in the doctor’s sample 
is actually much lower than you’d expect it to be. Given the claims 
made by the drug company, you’d expect 14 people to be cured, 
but instead, only 11 people have been. 

So why the discrepancy? 


{x = x)d 
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Does that mean that the drug 
company is telling lies about their 
product? ShouldtVt the drug have 
cured more of the doctors patients? 



The drug company might not be deliberately telling lies, 
but their claims might be misleading. 

It’s possible that the tests of the drug company were flawed, and this might 
have resulted in misleading claims being made about SnoreGull. They may 
have inadvertent conducted flawed or biased tests on SnoreGull, which 
resulted in them making inaccurate predictions about the population. 

If the success rate of SnoreGull is actually lower than 90%, this would explain 
why only 11 people in the sample were cured. 


But can we really be certain 
that the drug company is at fault? 
Maybe the doctor was unlucky. 



The drug company’s claims might actually be accurate. 

Rather than the drug company being at fault, it’s always possible that the 
patients in the doctor’s sample may not have been representative of the 
snoring population as a whole. It’s always possible that the snoring remedy does 
cure 90% of snorers, but the doctor just happens to have a higher proportion 
of people in her sample whom it doesn’t cure. In other words, her sample might 
be biased in some way, or it could just come down to there being a small 
number of patients in the sample. 

How do you think we can resolve this? How can we 
determine whether to trust the claims of the drug 
company, or accept the doctor’s doubts instead? 
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the hypothesis testing process 


Resolving the conflict from 50,000 feet 


So how do we resolve the conflict between the doctor and the drug 
company? Let’s take a very high level view of what we need to do. 


We can resolve the conflict between the drug company and the doctor by 
putting the claims of the drug company on trial. In other words, we’ll accept 
the word of the drug company by default, but if there’s strong evidence 
against it, we’ll side with the doctor instead. 

Here’s what we’ll do: 


^ Examine the claim 

^3 tii 



Examine tke evidence 

/ 

Sec how mu 乩 cvidcKWlc wc heed io 

diru 3 匕。， hyWlaim, ah d 
匕⑽ this agaihst the cvidch^c wc 

^vc. Wc do this by lookmg ai h ow 

= 代 the ircsul-ts would be i-f 

the dirug ^ompahy is 


i Make a decision 

r — 

Dcpchdihg Oh "the tM\dty\U, 

o^r the daims o( 

"the d\ru^ 匕。 mpdhy. 



In general, this process is called hypothesis testings as you take a 
hypothesis or claim and then test it against the evidence. Let’s look at 
the general process for this. 
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The six steps for hypothesis testing 

Here are the broad steps that are involved in hypothesis testing. We’ll go 
through each one in detail in the following pages. 


This is -the 

dairw that v/cVc 

pu-t-tmj oy \ 


M a 


level of ― ^ 
^Clrtaih-ty. 


o 


Decide on the hypothesis you’re going to test 

best tests 山跡 


❺ Choose your test statistic 


Determine the critical region for your decision 

二工 1 15 


Find the p-value of the test statistic 


❺ See whether the sample result is within the 
critical region 

o Make your decision 


We thch sec i-f i*t ； s v/rtlVm 
bounds o( dcv-ta'm*ty. 


ouv- 



Why all the 
formality? Ifs 
obvious there's 
something going on. 


We need to make sure we properly test the drug 
claim before we reject it. 

That way we’ll know we’re making an impartial decision either way, 
and we’ll be giving the claim a fair trial. What we don’t want to to do 
is reject the claim if there’s insufficient evidence against it, and this 
means that we need some way of deciding what constitutes sufficient 
evidence. 
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null and alternate hypotheses 


Step 1: Pecide on the hypothesis 

Let’s start with step one of the hypothesis test, and look at the key claim we 
want to test. This claim is called a hypothesis. 

The drug company's claim 

According to the drug company, SnoreGull cures 90% of patients within 
2 weeks. We need to accept this position unless there is sufficiently strong 
evidence to the contrary. 


ybu 


Decide on the 

1 ieve 

_ 

hypothesis you’re going 
to test 


Choose your test 
statistic 


Determine the critical 
region for your decision 


Find the p-value of the 
test statistic 


See whether the 
sample result is within 
the critical region 


Make your decision 


The claim that we’re testing is called the null hypothesis. It’s represented 
by H q , and it’s the claim that we’ll accept unless there is strong evidence 


against it. 


TV 仙 11 hypothesis is the daim you Ye 
"to lest (is the daim you II 
unless -thc\rc's st\rohj ajaihsi it 




o 


I'm the null 
hypothesis. I’m the 
default position. If you 
think I’m wrong, gimme 
the evidence. 


So whafs the null hypothesis for SnoreCull? 

The null hypothesis for SnoreGull is the claim of the drug company: that it 
cures 90% of patients. This is the claim that we’re going to go along with, 
unless we find strong evidence against it. 

We need to test whether at least 90% of patients are cured by the drug, so this 
means that the null hypothesis is that p = 90%. 


P>is is the hull hypothesis _ . 

‘ ihe ShovcCull tvial. ^^H 0 ： p = 0.9 
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So whaf s the alternative? 


We’ve looked at what the claim is we’re going to test, the null hypothesis, but 
what if it’s not true? What’s the alternative? 


The doctor's perspective 


The doctor’s view is that the claims of the drug company are too good to be 
true. She doesn’t think that as many as 90% of patients are cured. She thinks 
it’s far more likely that the cure rate is actually less than 90%. 

The counterclaim to the null hypothesis is called the alternate hypothesis. 
It’s represented by H p and it’s the claim that we’ll accept if there’s strong 
enough evidence 



The alternate hypothesis for SworeCull 


The alternate hypothesis for SnoreGull is the claim you’ll accept if the drug 
company’s claim turns out to be false. If there’s sufficiently strong evidence 
against the drug company, then it’s likely that the doctor is right. 

The doctor believes that SnoreGull cures less than 90% of people, so this 
means that the alternate hypothesis is that p < 90%. 


This is -the dltcv-haic 

hypothesis ^ ihc p < 0.9 

ShovcCull iv-ial 


Now that we have the null and alternate hypotheses for the SnoreGull 
hypothesis test, we can move onto step 2. 
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no dumb questions 


D 


tWeiare nQ 

)umb On 


Questi9ns 


Why are we assuming the null hypothesis is 
true and then looking for evidence that it’s false? 

When you conduct a hypothesis test, you, in effect, 
put the claims of the null hypothesis on trial. You give 
the null hypothesis the benefit of the doubt, but then you 
reject it if there is sufficient evidence against it. It’s a bit 
like putting a prisoner on trial in front of a jury. You only 
sentence the prisoner if there is strong enough evidence 
against him. 

Do the null hypothesis and alternate 
hypothesis have to be exhaustive? Should they 
cover all possible outcomes? 

No, they don’t. As an example, our null hypothesis 
is that p = 0.9, and our alternate hypothesis is that 
p < 0.9. Neither hypothesis allows for p being greater 
than 0.9. 


Isn’t the sample size too small to do this 
hypothesis test? 

Even though the sample size is small, we can still 
perform hypothesis tests. It all comes down to what test 
statistic you use — and we’ll come to that on the next 
page. 

So are hypothesis tests used to prove whether 
or not claims are true? 

Hypothesis tests don’t give absolute proof. They 
allow you to see how rare your observed results actually 
are, under the assumption that your null hypothesis 
is true. If your results are extremely unlikely to have 
happened, then that counts as evidence that the null 
hypothesis is false. 


Wken kypotkesis testing ，you assume tke null kypotkesis 
is true. If tkere^s suiiicient evictence against it, you reject 
it and accept tke alternate kypotkesis. 
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Step Z: Choose your test statistic 

Now that you’ve determined exactly what it is you’re going to test, you 
need some means of testing it. You can do this with a test statistic. 

The test statistic is the statistic that you use to test your hypothesis. It’s 
the statistic that’s most relevant to the test. 


Whafs the test statistic for SworeCull? 



Decide on the 
hypothesis you’re going 
to test 

Choose your test 


statistic 

Heje 

Determine the critical 
region for your decision 


Find the p-value of the 
test statistic 


See whether the 
sample result is within 
the critical region 


Make your decision 


In our hypothesis test, we want to test whether Snore Gull cures 90% of 
people or more. To test this, we can look at the probability distribution 
according to the drug company, and see whether the number of 
successes in the sample is significant. 


If we use X to represent the number of people cured in the sample, this 
means that we can use X as our test statistic. There are 15 people in the 
sample, and the probability of success according to the drug company 
is 0.9. As X follows a binomial distribution, this means that the test 
statistic is actually: 

TW»s *»s i\\t itsi statists 

B(15, 0.9) %o^cs»s test. 

^ up With this’/ 

ov\ page 52 • 午 . 


I*m confused. Why are 
we saying the probability 
of success is 0.9? Surely 
we don't know that yet. 


O 0 


We choose the test statistic according to H 0 , the null 
hypothesis. 

We need to test whether there is sufficient evidence against the null 
hypothesis, and we do this by first assuming that H Q is true. We then look 
for evidence that contradicts H Q . For the SnoreGull hypothesis test, we 
assume that the probability of success is 0.9 unless there is strong evidence 
against this being true. 

To do this, we look at how likely it is for us to get the results we did, 
assuming the probability of success is 0.9. In other words, we take the 
results of the sample and examine the probability of getting that result. 
We do this by finding a critical region. 
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finding critical regions 


Step 3 : PctcrmlwG the critical region 

The critical region of a hypothesis test is the set of values that present the 
most extreme evidence against the null hypothesis. 

Let’s see how this works by taking another look at the doctor’s sample. If 90% 
or more people had been cured, this would have been in line with the claims 
made by the drug company. As the number of people cured decreases, the 
more unlikely it becomes that the claims of the drug company are true. 

Here’s the probability distribution: 


H 嫩 



Decide on the 
hypothesis you’re going 
to test 

Choose your test 
statistic 


Determine the critical 
region for your decision 


Find the p-value of the 
test statistic 
See whether the 
sample result is within 
the critical region 
Make your decision 


< 


The -fcv/cv people av-c ture 山 
move likely >*b is 
dhru^ claims arc 


10 11 


\( °{0% o-f 

ihe d^ 


P^plc ih ihc sample had 
uldefly assumed 
u 9 s claims ^ 



12 13 14 15 


At what point caw we reject the drug company claims? 

The fewer people there are in the sample who are successfully cured by 
SnoreGull, the stronger the evidence there is against the claims of the drug 
company. The question is, at what point does the evidence become so strong 
that we confidently reject the null hypothesis? At what point can we reject the 
claim that SnoreGull cures 90% of snorers? 

What we need is some way of indicating at what point we can reasonably 
reject the null hypothesis, and we can do this by specifying a critical region. 

If the number of snorers cured falls within the critical region, then we’ll say 
there is sufficient evidence to reject the null hypothesis. If the number of 
snorers cured falls outside the critical region, then we’ll accept that there isn’t 
sufficient evidence to reject the null hypothesis, and we’ll accept the claims 
of the drug company. We’ll call the cut off point for the critical region c, the 
critical value. 






ov dv-i*tidal 
value 

— V 
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using hypothesis tests 


To find the critical region first decide on the sigwificawce level 

Before we can find the critical region of the hypothesis test, we first 
need to decide on the significance level. The significance level of 
a test is a measure of how unlikely you want the results of the sample 
to be before you reject the null hypothesis H。. Just like the confidence 
level for a confidence interval, the significance level is given as a 
percentage. 


As an example, suppose we want to test the claims of the drug 
company at a 5% level of significance. This means that we choose 
the critical region so that the probability of fewer than c snorers 
being cured is less than 0.05. It’s the lowest 5% of the probability 
distribution. 


^|| y^ull 


K tto is wc amc V% ihai 

4 snows UYtd will ^11 wilhih iWis 


Critical region 



◄- 


5% 


c 


95% 




The significance level is normally represented by the Greek letter a. The 
lower a is, the more unlikely the results in your sample need to be before 
we reject H。. 


So what significance level should we use? 


Let’s use a significance level of 5% in our hypothesis test. This 
means that if the number of snorers cured in the sample 
is in the lowest 5 % of the probability distribution, then we 
will reject the claims of the drug company. If the number of 
snorers cured lies in the top 95% of the probability distribution, 
then we’ll decide there isn’t enough evidence to reject the null 
hypothesis, and accept the claims of the drug company. 


If we use X to represent the number of snorers cured, then we 
define the critical region as being values such that 


P(X < c) < a 


where 



V?+aL S+a+fstto 

level 


The level is 

represented by a. \{!s a 
y/3y of say'm^ hoy/ unlikely 
you your rcsul*ts *to be 
bc-fov-e you II rejed-t W Q > 


a = 5% 
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critical regions in depth 



Critical j]p Cj^se 


When you’re constructing a critical region for your test, another thing you need to 
be aware of is whether you’re conducting a one-tailed or two-tailed test. Let’s 


look at the difference between the two, and what impact this has on the critical 


region? 


One-tailed tests 

A one-tailed test is where the critical region falls at one end 
of the possible set of values in your test. You choose the level 
of the test — represented by a — and then make sure that the 
critical region reflects this as a corresponding probability. 

The tail can be at either end of the set of possible values, and 
the end you use depends on your alternate hypothesis H r 


M - 

a 


A test 

a 七 the a level 


100% -a 


rteve v/cVc lov/cv -ta.l. 



If your alternate hypothesis includes a < sign, then use the 
lower tail, where the critical region is at the lower end of the 
data. 

If your alternate hypothesis includes a > sign, then use the 
upper tail, where the critical region is at the upper end of 
the data. 


^ 100% -a 


TV^c tr\hta\ 

st»ll at a\t^ 



c a 


We’re using a one-tailed test for the Snore Gull hypothesis 
test with the critical region in the lower tail, as our alternate 
hypothesis is that p < 0.9. 


Two - tailed tests 


A two-tailed test is where the critical region is split over both 
ends of the set of values. You choose the level of the test a, 
and then make sure that the overall critical region reflects this 
as a corresponding probability by splitting it into two. Both 
ends contain a/2, so that the total is a. 

You can tell if you need to use a two-tailed test by looking 
at the alternate hypothesis H r If H 1 contains a ^ sign, then 
you need to use a two-tailed test as you are looking for some 
change in the parameter, rather than an increase or decrease. 


This IS 3 tw 。 一 "tailed icsi, y/heve 
厂 the ^v-i-ti^al vegioh is spirt ovev- 
I "the "two tails. 

l ► ■如 

a/2 1 00% ■ a a /2 



We would have used a two-tailed test for our Snore Gull if our 
alternate hypothesis had been p ^ 0.9. We would have had to 
check whether significantly more or significantly fewer than 
90% of patients had been cured 
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using hypothesis tests 


Step 4: Fiwd the p - value 

Now that we’ve looked at critical regions, we can move on to step 4, finding 
the p-value. 

Ap-value is the probability of getting a value up to and including the one 
in your sample in the direction of your critical region. It’s a way of taking 
your sample and working out whether the result falls within the critical 
region for your hypothesis test. In other words, we use the p-value to say 
whether or not we can reject the null hypothesis. 



Decide on the 
hypothesis you’re going 
to test 


Choose your test 
statistic 

7 糾 

are ^ 

Determine the critical 
region for your decision 

Find the p-value of the 
test statistic 

H 嫩 

See whether the 
sample result is within 
the critical region 



Make your decision 


How do we find the p-value? 

How we find the p-value depends on our critical region and our test statistic. 
For the SnoreGull test, 11 people were cured, and our critical region is 
the lower tail of the distribution. This means that our p-value is P(X < 11), 
where X is the distribution for the number of people cured in the sample. 


As the significance level of our test is 5%, this means that if P(X < 11) is 
less than 0.05, then the value 11 falls within the critical region, and we can 
reject the null hypothesis. 


1^ P(X < ID -.s less 0 . 0 ^ 滅 = ⑽如七 li ,s 

Wdc 如伙細 I 扣 d 从 W ^ 



0.05 



0.95 



^^arpen your pencil 


We know from step 2 that X ~ B(15, 0.9). What’s P(X < 11)? 
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sharpen solution 



en your pencil 
Solution 


We know from step 2 that X ~ B(15, 0.9). What's P(X < 11)? Is 11 
inside or outside the critical region? 


P(>< < II) =11 - P(>< > IV 

=11 - ( ,5 c,2>ojVo 々 z + + + o . T ') 

- I - (o.l + + ozM% + o.zo^V 

-I - a 竹午弓 

二 o.o^ 


v i^q — I, and so does O.l' 
^ so v/eVe jus 七 Irf 七 Vrtii O 乃 


I? 


WeVc found the p-value 


To find the p-value of our hypothesis test, we had to find P(X < 11). This 
means that the p-value is 0.0555. 


Do I always calculate 
p-values in the same way? 
What if my critical region 
had been the upper tail? 


A p-value is the probability of getting the results in the 
sample, or something more extreme, in the direction of 
the critical region. 

In our hypothesis test for SnoreGull, the critical region is the lower tail of 
the probability distribution. In order to see whether 11 people being cured 
of snoring is in the critical region, we calculated P(X < 11), as this is the 
probability of getting a result at least as extreme as the results of our sample 
in the direction of the lower tail. 










Had our critical region been the upper tail of the probability distribution 
instead, we would have needed to find P(X > 11). We would have counted 
more extreme results as being greater than 11, as these would have been 
closer to the critical region. 
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using hypothesis tests 


Step 5 : Is the sample result iw the critical region? 

Now that we’ve found the p-value, we can use it to see whether the result 
from our sample falls within the critical region. If it does, then we’ll have 
sufficient evidence to reject the claims of the drug company. 


Our critical region is the lower tail of the probability distribution, and 
we’re using a significance level of 5%. This means that we can reject the 
null hypothesis if our p-value is less that 0.05. As our p-value is 0.0555, this 
means that the number of people cured by SnoreGull in the sample doesn’t 
fall within the critical region. 


嫩 ■ 

H 嫩 


Decide on the 
hypothesis you’re going 
to test 


Choose your test 
statistic 


Determine the critical 
region for your decision 


Find the p-value of the 
test statistic 


See whether the 
sample result is within 
the critical region 


Make your decision 



Step 6: Make your decision 

We’ve now reached the final step of the hypothesis test. We can decide 
whether to accept the null hypothesis, or reject it in favor of the alternative. 

The p-value of the hypothesis test falls just outside the critical region of 
the test. This means that there isn’t sufficient evidence to reject the null 
hypothesis. In other words: 


We accept the claims of the drug company 




Decide on the 
hypothesis you’re going 
to test 

Choose your test 
statistic 

Determine the critical 
region for your decision 
Find the p-value of the 
test statistic 
See whether the 
sample result is within 
the critical region 


Make your decision 
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hypothesis testing in review 


So what did we just do? 

Let’s summarize what we just did. 

First of all, we took the claims of the drug company, which the doctor had 
misgivings about. We used these claims as the basis of a hypothesis test. We 
formed a null hypothesis that the probability of curing a patient is 0.9, and 
then we applied this to the number of people in the doctors sample. 

We then decided to conduct a test at the 5% level, using the success rate in 
the doctor’s sample. We looked at the probability of 11 people or fewer being 
cured, and checked to see whether the probability of this was less than 5%, 
or 0.05. In other words, we looked at the probability of getting a result this 
extreme, or even more so. 

Finally, we found that at the 5% level, there wasn’t strong enough evidence to 
reject the claims of the drug company. 


But those results 
aren’t what the doctor 
wants. CarVt we test at a 
different level? 


O 


0 


Once you’ve fixed the significance level of the test, 
you can’t change it. 

The test needs to be completely impartial. This means that you decide 
what level you need the test to be at, based on what level of evidence 
you require, before you look at what evidence you actually have. 

If you were to look at the amount of evidence you have before deciding 
on the level of the test, this could influence any decisions you made. 

You might be tempted to decide on a specific level of test just to get the 
result you want. This would make the outcome of the test biased, and 
you might make the wrong decision. 
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using hypothesis tests 


BULLET POINTS - 

■ In a hypothesis test, you take a claim and test it 
against statistical evidence. 

■ The claim that you’re testing is called the null 
hypothesis test. It’s represented as H。，and it’s the 
claim that’s accepted unless there’s strong statistical 
evidence against it. 

■ The alternate hypothesis is the claim we’ll accept 
if there’s strong enough evidence against H。. It’s 
represented by H r 

■ The test statistic is the statistic you use to test your 
hypothesis. It’s the statistic that’s most relevant to 
the test. You choose the test statistic by assuming 
that H 0 is true. 

■ The significance level is represented by a. It’s a way 
of saying how unlikely you want your results to be 
before you'll reject H 


The critical region is the set of values that presents 
the most extreme evidence against the null 
hypothesis test. You choose your critical region by 
considering the significance level and how many tails 
you need to use. 

A one-tailed test is when your critical region lies 
in either the upper or the lower tail of the data. A 
two-tailed test is when it’s split over both ends. 

You choose your tail by looking at your alternate 
hypothesis. 

A p-value is the probability of getting the result 
of your sample, ora result more extreme in the 
direction of your critical region. 

If the p-value lies in the critical region, you have 
sufficient reason to reject your null hypothesis. If 
your p-value lies outside your critical region, you 
have insufficient evidence. 


thereiare no ^ 

Dumb Questions 


What significance level should I normally test at? 

It all depends how strong you want the evidence to be before 
you reject the null hypothesis. The stronger you want the evidence to 
be, the lower your significance level needs to be. 

The most common significance level is 5%, although you sometimes 
see tests at the 1 % level. Testing at the 1% means that you require 
stronger evidence than if you test at the 5% level. 


Does the significance level have anything in common with 
the level of confidence for confidence intervals? 

Yes, they haveO a lot in common. When you construct a 
confidence interval for a population parameter, you want to have 
a certain degree of confidence that the population parameter lies 
between two limits. As an example, if you have a 95% level of 
confidence, this means that the probability that the population 
parameter lies between the two limits is 0.95. 


I still have doubts. 

I wonder what would 
happen if I took a 
larger sample... 


The level of significance reflects the probability that values will lie 
outside a certain limit. As an example, a significance level of 5% 
means that your critical region must have a probability of 0.05. 


O 
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a new sample for hypothesis testing 


What if the sample size is larger? 

So far the doctor has conducted her trial using a sample of just 15 people, 
and on the basis of this, there was insufficient evidence to reject the 
claims of the drug company. 

It’s possible that the size of the sample wasn’t large enough to get an 
accurate result. The doctor might get more reliable results by using a 
larger sample. 

Here are the results from the doctor’s new trial: 


Cured? 

Yes 

No 

Frequency 

80 

20 




Q 


I want to conduct a new 
hypothesis test using 
these new results. 


We want to determine whether the new data will 
make a difference in the outcome of the test. 

Let’s run through another hypothesis test, this time with the larger 
sample. 





What’s the null hypothesis of this new problem? 
What’s the alternate hypothesis? 
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using hypothesis tests 


H^potkesis Magnets 


It’s time to do another hypothesis test. There are a number of steps 
you need to run through to perform the hypothesis test, but can you 
remember what the order is? Put the magnets into the right order. 





— ^ — _ _ 
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hypothesis magnets solution 



H^potkesis Magnets Solution 

It’s time to do another hypothesis test. There are a number of steps 
you need to run through to perform the hypothesis test, but can you 
remember what the order is? Put the magnets into the right order. 
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using hypothesis tests 


Let's conduct another hypothesis test 

The doctor still has misgivings about the claims made by the drug company. 
Let’s conduct a hypothesis test based on the new data. 


Step 1: Pecide on the hypotheses 

We need to start off by finding the null hypothesis and alternate 
hypothesis of the SnoreGull trial. As a reminder, the null hypothesis is the 
claim that we’re testing, and the alternate hypothesis is what we’ll accept if 
there’s sufficient evidence against the null hypothesis. 

So what are the null and alternate hypotheses? 


Decide on the 
hypothesis you’re going 
to test 


Choose your test 
statistic 

Determine the critical 
region for your decision 
Find the p-value of the 
test statistic 
See whether the 
sample result is within 
the critical region 
Make your decision 


Ifs still the same problem 

For the last test, we took the claims made by the drug company and used 
these as the basis for the null hypothesis. We’re testing the same claims, so the 
null hypothesis is still the same. We have 

H 0 ： p = 0.9 


The alternate hypothesis is the same too. If there is strong evidence 
against the claims made by the drug company, then we’ll accept that the 
drug cures fewer than 90% of the patients. This gives us an alternate 
hypothesis of: 


Hj p < 0.9 


Snor4 ll • 




o 



So you still don’t 
believe me? Think you 
can have another shot 
at me? Bring it on! 
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pick the test statistic 


Step Z: Choose the test statistic 

As before, the next step is to choose the test statistic. In other words, we 
need some statistic that we can use to test the hypothesis. 

For the previous hypothesis test, we conducted the test by looking at the 
number of successes in the sample and seeing how significant the result 
was. We used the binomial distribution to find the probability of getting a 
result at least as extreme as the value we got in the sample. In other words, 
we used a test statistic of X 〜 B(15, 0.9) to test whether P(X < 11) was less 
than 0.05, the level of significance. 

This time the number of people in the sample is 100, and we’re testing 
the same claim, that probability of successfully curing someone is 0.9. 
This means that our new test statistic is X 〜 B(100, 0.9). 


Decide on the 
hypothesis you’re going 
to test 


Choose your test 
statistic 


Determine the critical 
region for your decision 
Find the p-value of the 
test statistic 
See whether the 
sample result is within 
the critical region 
Make your decision 


Are you kidding me? 

If we have to calculate 
probabilities using the 
binomial distribution, well 
be here forever. 


We can use another probability distribution instead 
of the binomial. 

Using the binomial distribution for this sort of problem would be time 
consuming, as we’d have to calculate lots of probabilities. 

Fortunately, there’s another way. Rather than use the binomial 
distribution, we can use some other distribution instead. 



)babilitv distribution 


O 



What probability distribution could you use to approximate X 〜 B(100, 0.9)? 
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using hypothesis tests 



E%eftci$e 


To get the most out of hypothesis tests, you need to know how different variables and 
parameters are distributed. What distributions would you use to find probabilities for the 
following situations? ^ ^ ^ ⑽ d a|| “七 一 ca , lic , 

the book. I-P you y 七 s-tudk, look badk 
•thv-oujh -the dhap 七 evs 


1. X ~ B(n, p). What probability distribution could you use to approximate this if n is large, np > 5 and nq > 5? 


2. X 〜 N([j, a 2 ). You know the value of p and a 2 . What’s the distribution of X? 


3. X ~ N(|j, a 2 ), and you know what p is, but you don’t know what the value of a 2 is. The sample size is large. What’s 
the distribution of X given the data you have? 


4. X ~ N([j, a 2 ), you know what |j is, but you don’t know what the value of a 2 is. The sample size is small. What’s the 
distribution of X given the data you have? 
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exercise solution 



To get the most out of hypothesis tests, you need to know how different variables and parameters 
are distributed. What distributions would you use to find probabilities for the following situations? 

Hint: We covered all of these earlier in the book. If you get stuck, look back through the chapters. 

1. X ~ B(n, p). What probability distribution could you use to approximate this if n is large, np > 5 and nq > 5? 

|-f y\ is la\r^c, v/c approximate )p) usrng normal dis*bribu*ticm. E ()() 二吁 
\/av-()<) — -this y/c use )< ^ 的 p'). This assumes ^ a^d y\<\ > 


2. X 〜 N([j, a 2 ). You know the value of |j and a 2 . What’s the distribution of X? 
|-f y/c know y/ha*t value is o-f O 1 , )( ^ bICjA, OVir\). 


3. X ~ N([j, a 2 ), and you know what p is, but you don’t know what the value of a 2 is. The sample size is large. What’s 
the distribution of X given the data you have? 

|-f y/c do^*t know y/ha*t value is o( O 1 , v/c cs 七 •mate i 七 usrnj s z . So, )( ^ hl(jA, s z /r\). 


4. X ~ N(p ， a 2 ), you know what p is, but you don’t know what the value of a 2 is. The sample size is small. What’s the 
distribution of X given the data you have? 

l-f wc do^*t know wha*t value is O l , v/C estimate i*t usmj s z . |-f -the sample sizjC is small ； v/e Y\ttd *to use 
七一 dis*bribu*tio 竹 T 〜七 (的一 I) T 二 ）（ 一产 


s/ViT 
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using hypothesis tests 


Use the wormal to approximate the binomial m our test statistic 


We still need to find a test statistic we can use in our hypothesis test, and 
as the number in the sample is large, this means that using the binomial 
distribution will be time consuming and complicated. 

There are 100 people in the sample, and the proportion of successes 
according to the drug company is 0.9. In other words, the number of 
successes follows a binomial distribution, where n = 100 and p = 0.9. 


As n is large, and both np and nq are greater than 5, we can use 
X 〜 N(np, npq) as our test statistic, where X is the number of patients 
successfully cured. In other words, we can use 


X- N(90, 


V\fe taw use because v\ is 

lair^c , 叶 > $ 扣 d A 1 狄 y. 


to approximate any probabilities that we may need. 


If we standardize this, we get 


Z = X - 90 


V9 

X-90 


ttcv-c wcVc sta^davdiz-mj 

><" mo. n 



You use the test statistic to work out probabilities 
you can use as evidence. 

This means that we use Z as our test statistic, as we can easily use it to 
look up probabilities and see how unlikely the results of our sample 
are given the claims of the drug company. We substitute our value of 
80 in place of X, so we can use it to find the probability of 80 or fewer 
being cured. 
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find another critical region 


Step 3: Fiwd the critical region 

Now that we have a test statistic for our test, we need to come up 
with a critical region. As our alternate hypothesis is p < 0.9, this 
means that our critical region lies in the lower tail just as before. 

The critical region also depends on the significance level of the 
test. Let’s choose the same significance level as before, so let’s test 
at the 5% level. 



Decide on the 
hypothesis you’re going 
to test 

Choose your test 
statistic 


Determine the critical 
region for your decision 


Find the p-value of the 
test statistic 
See whether the 
sample result is within 
the critical region 
Make your decision 


As our test statistic follows a standard normal distribution, we 
can use probability tables to find the critical value, c. The critical 
value is the boundary between whether we have strong enough 
evidence to reject the null hypothesis or not. 

As our significance level is 5%, this means that our critical 
value c is the value where P(Z < c) = 0.05. If we look up the 
probability 0.05 in the probability tables, this gives us a value for 
c of -1.64. In other words, 


P(Z <-1.64) = 0.05 


This means that if our test statistic is less than -1.64, we have strong 
enough evidence to reject the null hypothesis. 

















using hypothesis tests 



E%eftci$e 


Think you can go through the remaining steps of the hypothesis test? See if you can find the 
following: 


Step 4: Find the p-value 

The critical region is in the lower tail of the distribution. 80 people were cured, 
and Z = (X - 90)/3. Use this to find the p-value. 


Step 5: See whether the test statistic is within the critical region 

Remember that the significance level for the hypothesis test is 5%. 


Step 6: Make your decision 

Do you accept or reject the null hypothesis based on the evidence? 
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exercise solution 



Think you can go through the remaining steps of the hypothesis test? See if you can find the 
following: 


Step 4: Find the p-value 

The critical region is in the lower tail of the distribution. 80 people were cured, 
and Z = (X - 90)/3. Use this to find the p-value. 

Lrt’s s*ta\rt by -f mdmg sdo\rc o-f 00. 

2 . - (00 - ^ 0)/1 

二 - 10 / 1 > 

二-说 

The p-valuc is by P(Z- < zj — p(^ < -S33). Looking -this up m probabil’rty -tables ^ives us 
p-valuc — 0.000 午 


Step 5: See whether the test statistic is within the critical region 

Remember that the significance level for the hypothesis test is 5%. 

The *tcs*t is oy\ -the \rcjioir\ i-f p — value is less 的 0.0^. 七 he p — value is e<\udl 

*to O OOO^t *this med^s *tha*t *biic *tcs*t s*ta*tis*tid is ^\ri*ti^al vegio 的 . 


Step 6: Make your decision 

Do you accept or reject the null hypothesis based on the evidence? 


As *tcs*t s*ta*tis*ti^ is 匕 \ri*tidal vc^io^ -fo\r *thc hypo*tiiCsis *this med^s v/C have 

su*f*fi 匕 ieirrt cvidci^dc *to \reje 匕七 *tiic null hypothesis a 七 level- 
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using hypothesis tests 


SnoreCull failed the test 

This time when we performed a hypothesis test on SnoreGull, 
there was sufficient evidence to reject the null hypothesis. 

In other words, we can reject the claims made by the drug 
company. 




Shouldn^ we have 
just accepted the doctors 
opinion in the first place? 


Hypothesis tests require evidence. 

With a hypothesis test, you accept a claim and then put it on trial. You 
only reject it if there’s enough evidence against it. This means that the 
tests are impartial, as you only make a decision based on whether or 
not there’s sufficient evidence. 

If we had just accepted the doctor’s opinion in the first place, we 
wouldn’t have properly considered the evidence. We would have 
made a decision without considering whether the results could have 
been explained away by mere coincidence. As it is, we have enough 
evidence to show that the results of the sample are extreme enough 
to justify rejecting the null hypothesis. The results are statistically 
significant, as they’re unlikely to have happened by chance. 

So does this guarantee that the claims of the drug company 
are wrong? 
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our hypothesis might still be wrong 


Mistakes can happen 

So far we’ve looked at how we can use the results of a sample as evidence in 
a hypothesis test. If the evidence is sufficiently strong, then we can use it to 
justify rejecting the null hypothesis. 

We’ve found that there is strong evidence that the claims of the drug 
company are wrong, but is this guaranteed? 


Of course it is. We’ve 
done a hypothesis test, and 
weve used it to prove that 
the drug company is lying. 


O 


o 


Even though the evidence is strong, we can’t absolutely 
guarantee that the drug company claims are wrong. 

Even though it’s unlikely, we could still have made the wrong decision. We can 
examine evidence with a hypothesis, and we can specify how certain we want 
to be before rejecting the null hypothesis, but it doesn’t prove with absolute 
certainty that our decision is right. 

The question is, how do we know? 

Conducting a hypothesis test is a bit like putting a prisoner on trial in front 
of a jury. The jury assumes that the prisoner is innocent unless there is strong 
evidence against him, but even considering the evidence, it’s still possible for 
the jury to make wrong decisions. Have a go at the exercise on the next page, 
and you’ll see how. 



Dumb Quest? 


9ns 



How can we make the wrong decision if we’re conducting 
a hypothesis test? Don’t we do a hypothesis test to make sure 
we don’t? 

When you conduct a hypothesis test, you can only make a 
decision based on the evidence that you have. Your evidence is 
based on sample data, so if the sample is biased, you may make the 
wrong decision based on biased data. 


Q；i ，ve heard of something called significance tests. What are 
they? 

Some people call hypothesis tests significance tests. This is 
because you test at a certain level of significance. 
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using hypothesis tests 



rpen your pencil 


A prisoner is on trial for a crime, and you’re on the jury. The jury’s 
task is to assume the prisoner is innocent, but if there’s enough 
evidence against him, they need to convict him. 


1. In the trial, what's the null hypothesis? 


2. What’s the alternate hypothesis? 


3. In what ways can the jury make a verdict that’s correct? 


4. In what ways can the jury make a verdict that’s incorrect? 
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sharpen solution 



A prisoner is on trial for a crime, and you’re on the jury. The jury’s 
task is to assume the prisoner is innocent, but if there’s enough 
evidence against him, they need to convict him. 


1. In the trial, what’s the null hypothesis? 

The ywaW hypo-tiicsis is p\riso^c\r is 3s is y/ha*t v/C have *to 

assume im'til ■bhev-c^s proof o*thc\rv/isc- 


2. What’s the alternate hypothesis? 


The al*tc\nr^a*tc hypo-thesis is -the pv-isoir\c\r is juil*ty. I 灼 o*tiicv* y/ovds, i-f -thcv-c^s 匕 ie 灼七 pv-oc^f 
pvisoi^cv- is 竹 at nrmodcirrt, well he’s Juil*ty doi^vid*t him. 


3. In what ways can the jury make a verdict that’s correct? 


IVc dd 灼 make d do\r\rcd*t vc\rdid*t i-f ： 

The pv-isoir\C\r is and v/€ -f'md him *mir\odc^t 

The pv-isoir\C\r is juil*ty> v/C -f md him Juil*ty- 


4. In what ways can the jury make a verdict that’s incorrect? 

Wt CBy\ mdke d)r\ *mdo\r\rcd*t vc\rdid*t i-f 

The p\risoir\C\r is *mir\odc^*t) y/e -f md him juil*ty- 
The p\riso^c\r is 5uil*ty ； y/e -f md him 
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using hypothesis tests 



O 


So what does 
putting prisoners on 
trial have to do with 
hypothesis testing? 


The errors we can make when conducting a hypothesis 
test are the same sort of errors we could make when 
putting a prisoner on trial. 

Hypothesis tests are basically tests where you take a claim and put it on trial 
by assessing the evidence against it. If there’s sufficient evidence against it, 
you reject it, but if there’s insufficient evidence against it, you accept it. 


You may correctly accept or reject the null hypothesis, but even considering 
the evidence, it’s also possible to make an error. You may reject a valid null 
hypothesis, or you might accept it when it’s actually false. 

Statisticians have special names for these types of errors. A Type I error is 
when you wrongly reject a true null hypothesis, and a Type II error is when 
you wrongly accept a false null hypothesis. 


The power of a hypothesis test is the probability that that you will correctly 
reject a false null hypothesis. 

Decision from hypothesis test 


Actual 

situation 






How do you think we can find the probability of making a Type I error? How do 
you think we can find the probability of making a Type II error? 


you are here ► 


555 













type i and type ii errors 


Lefs start with Type 丨 errors 


A Type I error is what you get when you reject the null hypothesis when 
the null hypothesis is actually correct. It’s like putting a prisoner on trial 
and finding him guilty when he’s actually innocent. 




So whafs the probability of 
getting a Type I error? 

If you get a Type I error, then this means that 
the null hypothesis must have been rejected. 

In order for the null hypothesis to have been 
rejected, the results of your sample must be in 
the critical region. 




The probability of getting a Type I error is the probability of your 
results being in the critical region. As the critical region is defined by the 
significance level of the test, this means that if the significance level of 
your test is a, the probability of getting a Type I error must be also be a. 


In other words, 


P(Type I error) = a 

where a is the significance level of the test. 
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using hypothesis tests 


What about Type II errors? 

A Type II error is what you get when you accept the null hypothesis, 
and the null hypothesis is actually wrong. It’s like putting a prisoner on 
trial and finding him innocent when he’s actually guilty. 







The probability of getting a Type II error is normally represented by 
the Greek letter (3. 


P(Type II error) = p 


So how do we find ( 3 ? 

Finding the probability of a Type II error is more difficult than finding 
the probability of getting a Type I error. Here are the steps that are 
involved, and we’ll show you how to go through them on the next page. 


© 

o 

❹ 


Check that you have a specific value for H r 

Without this, you can’t calculate the probability of getting a Type II error. 

Find the range of values outside the critical region of your test. 

If your test statistic has been standardized, the range of values must be de-standardized. 

Find the probability of getting this range of values, assuming H 1 is true. 

In other words, we find the probability of getting the range of values outside the critical 
region, but this time, using the test statistic described by H 1 rather than H Q . 
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calculating type i and type ii errors 


Finding errors for SworeCull 


Let’s see if we can find the probability of getting Type I and Type II errors for 
the SnoreGull hypothesis test. As a reminder, our standardized test statistic is 

Z = X - 90 


3 



where X is the number of people cured in the sample. The significance level of 
the test is 5 % . 


Let's start with the Type I error 


A Type I error is what you get when you reject the null hypothesis when 
actually it’s true. The probability of getting this sort of error is the same as the 
significance level of the test, so this means that 


P(Type I error) = 0.05 


av-c 


tuv-cd I 七 ’ s *bruc. 


So what about the Type II error? 


A Type II error is what you get when you accept the null hypothesis when the 
alternate hypothesis is true. We can only calculate this if H 1 specifies a single 
specific value, so let’s use an alternate hypothesis of p = 0.8, as this is the 
proportion of successes in the doctor’s sample. This means that our hypotheses 
become 


H o : P 

H rP 


0.9 T W “一， II 4'?二 ? 呼 cad 

08 tabulate \\ 


^ V alWe 卬融 . 


The reason why H 1 must specify an exact value for p is so that we can calculate 
probabilities using it. If we used an alternate hypothesis of p < 0.9, we wouldn’t 
be able to use it to calculate the probability of getting a Type II error. 



赠 ax 


lo look up pv-obabilitics the 
hypoihesis pirobabilrty 
disiv-ibutioh, r\ccd 3r\ e 乂 a 匕七 
value -Pov- p. 


If you need to calculate the 
probability of getting a Type II error 
in an exam ， you’ll be given H • 


This means that you won’t have to decide on the 
alternate hypothesis yourself. If you need to calculate 
this sort of error, it will be given to you. 
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using hypothesis tests 


Wc need to find the rawgc of values 

Now that the alternate hypothesis H 1 gives a specific value for p, we can move 
on to the next step. We need to find the values of X that lie outside the critical 
region of the hypothesis test. 

We saw back on page 548 that the critical region for the test is given by Z 
< -1.64 — in other words, P(Z < -1.64) = 0.05. This means that values that fall 
outside the critical region are given by Z > -1.64. 



If we de-standardize this, we get 

X-90 >-1.64 

X-90 >-1.64x3 
X> -4.92 + 90 
X> 85.08 


In other words, we would have accepted the null hypothesis if 85.08 people 
or more had been cured by SnoreGull. 

The final thing we need to do is work out P(X > 85.08), assuming that 
is true. That way, we’ll be able to work out the probability of accepting the 
null hypothesis when actually is true instead. As we’re using the normal 
distribution to approximate X, we need to use a probability distribution 
X 〜 N(np, npq), where n = 100 and p = 0.8. This gives us 

X 〜 N(80, 16) 


This means that if we can calculate P(X > 85.08) where X 〜 N(80, 16) ， we’ll 
have found the probability of getting a Type II error. 

We calculate this in the same way we calculate other normal distribution 
probabilities, by finding the standard score and then looking up the value in 
standard normal probability tables. 
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more on type ii errors 


Find P(Type II error) 


We can find the probability of getting a Type II error by calculating 
P(X > 85.08) where X 〜 N(80, 16). Let’s start off by finding the standard 


score of 85.08. 


z = 85.08 - 80 

Vl6 

= 5.08 
4 

=1.27 


TVis is usual >way 
s*bar\davd stove ； just sub*tv-a^*b 
ad divide by tV^c sta^davd deviation. 


This means that in order to find P(X > 85.08), we need to use standard 
probability tables to find P(Z > 1.27). 


P(Z> 1.27) = 1 - P(Z < 1.27) 


In other words, 


=1 - 0.8980 
= 0.102 


P(Type II error) = 0.102 tu\rcd ^0 h ok \>cop t a 



Q/ Why is it so much harder to find P(Type II error) than 
P(Type I error)? 

It’s because of the way they’re defined. A Type I error is what 
you get when you wrongly reject the null hypothesis. The probability 
of getting this sort of error is the same as a, the significance level of 
the test. 

A Type II error is the error you get when you accept the null 
hypothesis when actually the alternate hypothesis is true. To find the 
probability of getting this sort of error, you need to start by finding 
the range of values in your sample that would mean you accept the 
null hypothesis. Once you’ve found these values, you then have to 
calculate the probability of getting them assuming that H 1 is true. 


Do I need to use the normal distribution every time I want 
to find the probability of getting a Type II error? 

The probability distribution you use all depends on your test 
statistic. In this case, our test statistic followed a normal distribution, 
so that’s the distribution we used to find P(Type II error). If our test 
statistic had followed, say, a Poisson distribution, we would have 
used a Poisson distribution instead. 
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using hypothesis tests 


Introducmg power 


So far we’ve looked at the probability of getting different types of error in 
our hypothesis test. One thing that we haven’t looked at is power. 

The power of a hypothesis test is the probability that we will reject H Q 
when H q is false. In other words, it’s the probability that we will make the 
correct decision to reject H Q . 


That sounds complicated. I 
hope ifs not as difficult to 
find as P(Type II error). 


Once you’ve found P(Type II error), calculating the 
power of a hypothesis test is easy. 

Rejecting H Q when H Q is false is actually the opposite of making a Type II 
error. This means that 


Power = 1 - p 


where (3 is the probability of making a Type II error. 


So whafs the power of SnoreCull? 



We’ve found the probability of getting a Type II error is 0.102. This 
means that we can find the power of the Snore Gull hypothesis test by 
calculating 


Power = 1 - P(Type II error) 

=1 - 0.102 
= 0.898 

In other words, the power of the Snore Gull hypothesis test is 0.898. This 
means that the probability that we will make the correct decision to reject 
the null hypothesis is 0.898. 
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snorecull is a fraud! 


The doctor's happy 

In this chapter, you’ve run through two hypothesis tests, and 
you’ve proved that there’s sufficient evidence to reject the claims 
made by the drug company. You’ve been able to show that based 
on the doctor’s sample, there’s sufficient evidence that SnoreGull 
doesn’t cure 90% of snorers, as the drug company claims. 



I thought that the claims sounded too 
good to be true, and you’ve proved that 
there are strong statistical grounds 
for showing I*m right, ril sleep quieter 
at night knowing that. 


Put it doesn't stop there 

Keep reading, and we’ll show you what 
other sorts of hypothesis tests you can use. 
We’ll see you over at Fat Dan’s Casino... 
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using hypothesis tests 



The drug company and their cough syrup manufacturer are having a dispute. The factory says 
that the amount of syrup that gets poured into their bottles follows a distribution 
X ~ N(355, 25), where X is the amount of syrup in the bottle measured in mL. The drug company 
conducted tests on a large sample and found that the mean amount of syrup in 100 bottles 
is 356.5 mL Test the hypothesis that the factory mean is correct at a 1% level of significance 
against the alternative that the mean amount of syrup in a bottle is greater than 355 mL 

We’re going to guide you through this exercise in two parts. Here are the first three steps. 


Step 1: Decide on the hypothesis you’re going to test. What’s the null hypothesis? What’s the alternate 
hypothesis? 


Step 2: Choose your test statistic. u • 丄 、 / » 

nmt : Y ouv " nypotncsis the mean 

so -the distvibu 七 ion of 又？ How 

do you stahdavdizjc this? 


Step 3: Determine the critical region for your decision. Does the critical region lie in the lower or upper tail of the 
distribution? What’s the significance level? What’s the critical value? 
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exercise solution 



The drug company and their cough syrup manufacturer are having a dispute. The factory says 
that the amount of syrup that gets poured into their bottles follows a distribution 
X ~ N(355, 25), where X is the amount of syrup in the bottle measured in mL. The drug company 
conducted tests on a large sample and found that the mean amount of syrup in 100 bottles 
is 356.5 mL Test the hypothesis that the factory mean is correct at a 1% level of significance 
against the alternative that the mean amount of syrup in a bottle is greater than 355 mL 

We’re going to guide you through this exercise in two parts. Here are the first three steps. 


Step 1: Decide on the hypothesis you’re going to test. What’s the null hypothesis? What’s the alternate 
hypothesis? 

Wt *to *tcs*t v/hc*t^C\r mean amouir\*t sy\ruf m bo*t*tlcs is mL like -fad*to\ry says. This 
^ives us 

tt,： Y > 

Step 2: Choose your test statistic. 


7 O z /so medics uir\dc\r -the ywa\\ hypo-thesis, )< ^ Uj/\QO) o\r ~ O Z^). 

l-f wc s*ta^da\rdizjc -tiiis, wc 
z 二孓-狹 

二 X - 狹 


Step 3: Determine the critical region for your decision. Does the critical region lie in the lower or upper tail of the 
distribution? What’s the significance level? What’s the critical value? 

The al*tc\nr\a*tc iwpo*thcsis is 3 弓弓 , y/hidh d\ri*tidal \rcjio^ lies ’m upper -tail. iVc v/a^*t *to *tcs*t 

a*t I % level, so -the dv-i-tidal vc^ioh is dc-fmcd by > 乙）二 OOl. Us'mj probability -tables, 

■this yves us d — Z32*. I 的 o*thc\r v/ordsf d\ri*tidal v-cjioi^ is y\itv\ by Z* > Z3Z. 
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using hypothesis tests 



This exercise continues where the last left off. Here are the final three steps of the hypothesis 
test. What do you conclude? 


Step 4: Find the p-value of the test statistic. Use the distribution Z = (X - 355)/0.5, the mean amount of syrup in 
the sample, and remember that this time you’re seeing if your test statistic lies in the upper tail of the distribution, as 
this is where the critical region is. 


Step 5: See whether the sample result is within the critical region. Remember that you’re testing at the 1% 
significance level. 


Step 6: Make your decision. Is there enough evidence to reject the null hypothesis at the 1 % level of significance? 


you are here ► 


565 






exercise solution 



This exercise follows on from the last. Here are the final three steps of the hypothesis test. What 
do you conclude? 


Step 4: Find the p-value of the test statistic. Use the distribution Z = (X - 355)/0.5, the mean amount of syrup in the 
sample, and remember that this time you’re seeing if your test statistic lies in the upper tail of the distribution as this 
is where the critical region is. 


Z -- 狹 )/05 
- (视 - 狹 )/0 .弓 
二 I.VO. 弓 



The p—value for is jivcir\ by P(Z > 3), as is uppev- *tai|. Looking *tiVis up m 

probability tables jives us 

p-valuc — O.OOlZ 

Step 5: See whether the sample result is within the critical region. Remember that yoiTre testing at the 1 % 
significance level. 

The f-valuc 0.00ll> is less O.Ol, -the si^i-fida^dc level, so -that sample \rcsul*t is 

d\ri*tidal v-cjioi^ 


Step 6: Make your decision. Is there enough evidence to reject the null hypothesis at the 1 % level of significance? 

As sample \rcsul*t lies m \rcjio^ suUi 乙 icirrt *to \rcjc 匕 *t null hypo*thcsis. IVc 

da 灼 a 匕匕 cp 七 *tiic al*tc\nr\a*tc hypothesis /a > ml. 


BULLET POINTS - 

■ A Type I error is when you reject the null hypothesis when it’s actually 
correct. The probability of getting a Type I error is a, the significance level of 
the test. 

■ A Type II error is when you accept the null hypothesis when it’s wrong. The 
probability of getting a Type II error is represented by p. 

■ To find (3, your alternate hypothesis must have a specific value. You then 
find the range of values outside the critical region of your test, and then find 
the probability of getting this range of values under H r 
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14 th^ X 2 distribution 


There’s Something ♦ 
命 Going On … 



Sometimes things don’t turn out quite the way you expect. 

When you model a situation using a particular probability distribution, you have a 
good idea of how things are likely to turn out long-term. But what happens if there are 
differences between what you expect and what you get? How can you tell whether 
your discrepancies come down to normal fluctuations, or whether they’re a sign of 
an underlying problem with your probability model instead? In this chapter, we'll 
show you how you can use the x 2 distribution to analyze your results and sniff out 
suspicious results. 


this is a new chapter 
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could fat dan’s casino be rigged? 


There may be trouble ahead at Fat Paw's Casino 

Fat Dan’s is used to making a tidy profit from its 
casino-goers, but this week there’s a problem. The slot 
machines keep hitting the jackpot, the roulette wheel 
keeps landing on 12, the dice are loaded, and too many 
people are winning off one of the blackjack tables. 

The casino can’t support the loss for much longer, and 
Fat Dan suspects foul play. He needs your help to get 
to the bottom of what’s going on. 



m 


m 


Jackpot! 


Jackpot! 


J 1 




OV 
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the distribution 


Lefs start with the slot machines 

As you’ve seen before, Fat Dan’s Casino has a full row of bright, shiny slot 
machines, just waiting to be played. The trouble is that people keep on playing 
them — and winning. 


Here’s the expected probability distribution for one of the slot machines, where X 
represents the net gain from each game played: 


li's /z J>c\r SO i-f 
you dor!i wih dhythihg, 

you lose youv- fZ. 



X 

■2 

23 

48 

73 

98 

P(X = x) 

0.977 

0.008 

0.008 

0.006 

0.001 


… ja6k ? ot, 

YOUV" ，S 


The casino has collected statistics showing the number of times people get each 
outcome. Here are the frequencies for the observed net gains per game: 


The 4c n uch^ shows 


X 

■2 

23 

48 

73 

98 

Frequency 

965 

10 

9 

9 

7 


^harpen your pencil 

—m. 丁 “ -C\ 


, We need to compare the actual frequency of each value of x with 

e ° what you’d expect the frequency to be based on the probability 

distribution. Fill in the table below. What do you notice? 


X 

Observed frequency 

Expected frequency 

■2 

965 

977 

23 

10 


48 

9 


73 

9 


98 

7 



㈣ W Ual oWved ⑦以 

Hsie 


bo be- 
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sharpen your pencil solution 


We need to compare the actual frequency of each value of x with 
what you’d expect the frequency to be based on the probability 
distribution. Fill in the table below. What do you notice? 




a between o-f people yoi/d *to v/m -the jadkpo-t, based 

oy\ p\robabili*tv dis*tvibu*tio^ d^d -the number o\ people actually it lVlia*t wc doir/*t 

\cy\o^i is how sij^i-fidair\*t "these di-f-fc\rc^dcs a\rc- 


X 

Observed frequency 

Expected frequency 

■2 

965 

977 

23 

10 

e 

48 

9 

e 

73 

9 


98 

7 

l 




^ V>7 ■ • l 

c atV> 



下 en your pencil 
Solution 



Looking at the data, it looks 
like there might be something 
going on with the slot machine 
payouts. But how can we be 
certain? Ifs unlikely, but this 
could happen by pure chance. 


We need some way of deciding whether these results 
show the slot machines have been rigged. 

What we need is some sort of hypothesis test that we can use to test the 
differences between the observed and expected frequencies. That way, we’ll 
have some way of deciding whether the slot machines have been tampered 
with to make sure they keep paying out lots of money. 

The question is, what sort of distribution can we use for this hypothesis test? 
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the distribution 


The x z test assesses difference 


There’s a new sort of probability distribution that does exactly what we want; 
it’s called the yj distribution. )( is pronounced “kye”，and it’s the uppercase 
Greek letter chi. It uses a test statistic to look at the difference between what 
we expect to get and what we actually get, and then returns the probability of 
getting observed frequencies as extreme. 

Let’s start with the test statistic. To find the test statistic, first make a table 
featuring the observed and expected frequencies for your problem. When 
you’ve done that, use your observed and expected frequencies to compute the 
following statistic, where O stands for the observed frequency, and E for the 


expected frequency: 



In other words, for each probability in the probability distribution, you take 
the difference between the frequency you expect and the frequency you 
actually get. You square the result, divide by the expected frequency, and 
then add all of these results up together. 

So what’s the test statistic for the slot machine problem? 


曹 pen 



Use the table of observed and expected frequencies you just 
worked out on the previous page for Fat Dan’s slot machines to 
compute the test statistic. What result do you get? 


What do you think a low value tells you? What about a high value? 
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another sharpen solution 


i^arpefi your pencil 

Solution 


Use the table of observed and expected frequencies you just 
worked out on the previous page for Fat Dan’s slot machines to 
compute the test statistic. What result do you get? 


What do you think a low value tells you? What about a high value? 


>< z - - m)VTn + (io - «) z /« + - «) z /« + ㈣ + n - dvi 

二 (- iz)vrn + z z /« + ive + z z /t> + ^ 

二 I 午午 /rn + + 1 /« + ia + ^ 

二 o\M + + o.li^ ++ zt> 

二 ze.ziz 



d\((crtY\tt between obsev-ved d^d 
!\rcir\dcs bcdomc. 


So what does the test statistic represent? 


The test statistic X 2 gives a way of measuring the difference between the 
frequencies we observe and the frequencies we expect. The smaller the value 
of X 2 , the smaller the difference overall between the observed and expected 
frequencies. 

You divide by E, the expected frequency, as this makes the result proportional 


to the expected frequency. 


between 0 



So at what point does X 2 become so large that it’s significant? We need to 
figure out when we can fairly certain that something’s going on with the slot 
machines that’s beyond what could reasonably happen by chance. 

To find this out, we need to look at the x 2 distribution. 
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the distribution 


Two mam uses of the x z distribution 

The X 2 probability distribution specializes in detecting when the results you 
get are significantly different from the results you expect. The probability 
distribution does this using the X 2 test statistic you saw earlier. 

The x 2 distribution has two key purposes. 

First of all, it’s used to test goodness of fit. This means that you can use 
it to test how well a given set of data fits a specified distribution. As an 
example, we can use it to test how well the observed frequencies for the slot 
machine winnings fits the distribution we expect. 

Another use of the x 2 distribution is to test the independence of two 
variables. It’s a way of checking whether there’s some sort of association. 

The x 2 distribution takes one parameter, the Greek letter V ， pronounced 
“new.” Let’s take a look at the effect that v has on the shape of the 
probability distribution. 


Whew v is 1 or Z 

When v has a value of 1 or 2, the shape of the x 2 distribution 
follows a smooth curve, starting off high and getting lower. It’s 
shape is like a reverse J. The probability of getting low values of 
the test statistic X 2 is much higher than getting high values. In 
other words, observed frequencies are likely to be close to the 
frequency you expect. 



Whew v is greater thaw 1 

When v has a value that’s greater than 2, the shape of the yj 
distribution changes. It starts off low，gets larger, and then 
decreases again as X 2 increases. The shape is positively skewed, 
but when v is large, it’s approximately normal. 


八 



\i V,as -tW»s sort d ^ V 
•is yca*tcv- Z. TKc lavyv* 

v becomes) W'OV-C ir\oV*mal 

dis*tv-'>kutioy\ yb. 





A shorthand way of saying that you’re using the test statistic X 2 with the 
X 2 distribution that has a particular value of v is 


X 2 - X 2 (v) ^ 


Y 1 -follows d dis*tv-*»biA*tior\ y/itV) d value V. 


Its like y(, but du\rvicv". 
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degrees of freedom 


v represents degrees of freedom 

You’ve seen how the shape of the x 2 distribution depends on the value of v, but 
how do we find what v is? 

V is the number of degrees of freedom. It’s the number of independent 
variables used to calculate the test statistic X 2 , or the number of independent 
pieces of information. Let’s see what this means in practice. 

Here’s another look at the table of observed and expected frequencies for the 
slot machines: 


X 

Observed frequency 

Expected frequency 

■2 

965 

977 

23 

10 

8 

48 

9 

8 

73 

9 

6 

98 

7 

1 


The number of degrees of freedom is the number of expected frequencies we 
have to calculate, taking into account any restrictions we have upon us. 

In order to calculate the test statistic X 2 , we had to calculate all of the expected 
frequencies. This meant that we had to calculate five expected frequencies. 
While calculating this, we had one thing we had to bear in mind: the total 
expected frequency and the total observed frequency had to add up to the same 
amount. In other words, we had one restriction on us in our calculations. 

So what's v? 

To calculate V ， we take the number of pieces of information we calculated, and 
subtract the number of restrictions. To figure out the test statistic X 2 , we had to 
calculate five separate pieces of information, with 1 restriction. This means that 
the number of degrees of freedom is given by 

v = 5 - 1 
= 4 

Another way of looking at this is that we had to calculate four of the expected 
frequencies using the probability distribution. We could work out the final 
frequency by looking at what the total expected frequency should be. 

In general, 


v = (number of classes) - (number of restrictions) 
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Whafs the significance? 


So how can we use the X 2 distribution to say how significant the 
discrepancy is between the observed and expected frequencies? As 
with other hypothesis tests, it all depends on the level of significance. 

When you conduct a test using the yj distribution, you conduct a one- 
tailed test using the upper tail of the distribution as your critical region. 
This way, you can specify the likelihood of your results coming from 
the distribution you expect by checking whether the test statistic lies in 
the critical region of the upper tail. 

If you conduct a test at significance level a, then you write this as 

)C 2 a (v) 


八 






X 2 Jv) 


So how do we find the critical region for the yj distribution? We can use 
X 2 probability tables. 


How to use x z probability tables 

To find the critical value, start off with the degrees of freedom, v, and 
the significance level, a. Use the first column to look up v, and the 
top row to look up a. The place where they intersect gives the value x, 
where P(x 2 a (v) > x) = a. In other words, it gives you the critical value. 


As an example, if you wanted to find the critical value for testing at the 
5% level with 8 degrees of freedom, you’d find 8 in the first column, 

0.05 in the top row, and read off a value of 15.51. In other words, if 
our test statistic X 2 was greater than 15.51, it would be in the critical 
region at the 5% level with 8 degrees of freedom. 

Hcyrc s dolumv. ^ov 0 0 ^ 


tteve s the 
v-ow -fov- 
V 二 0. 〆 



^Tail 

probabili 

ly a 


V 

•25 

•20 

.15 

.10 | 

[ 0S 1 

.025 


•01 

.005 

.0025 

.001 

1 

1.32 

1.64 

2.07 

2.71 

1 

w 

5.02 


6.63 

7.88 

9.14 

10.83 

2 

2.77 

3.22 

3.79 

4.61 

5. 

9 

7.38 

7.82 

9.21 

10.60 

11.98 

13.82 

3 

4.11 

4.64 

5.32 

6.25 

7, 

1 

9.35 

9.84 

11.34 

12.84 

14.32 

16.27 

4 

5.39 

5.99 

6.74 

7.78 

9. 

9 

11.14 

11.67 

13.28 

14.86 

16.42 

18.47 

5 

6.63 

7.29 

8.12 

9.24 

11. 

7 

12.83 

13.39 

15.09 

16.75 

18.39 

20.51 

6 

7.84 

8.56 

9.45 

10.64 

12. 

9 

14.45 

15.03 

16.81 

18.55 

20.25 

22.46 

7 

9.04 

9.80 

10.75 

12.02 


|l 

16.01 

16.62 

18.48 

20.28 

22.04 

24.32 








\ 17.53 

18.17 

20.09 

21.95 

23.77 

26.12 


11.39 

12.24 

13.29 

14.68 

A. 

H 

[ 19.02 

19.68 

21.67 

23.59 

25.46 

27.88 


TVis is y/licv'C ^ w'Cct 
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X 2 hypothesis testing steps 


Hypothesis testing with x z 


Here are the broad steps that are involved in hypothesis testing with the x 2 
distribution. 



o 

❺ 


❺ 

O 

o 


❺ 


Decide on the hypothesis you’re going to test, 
and its alternative 


Find the expected frequencies and the degrees 
of freedom 



Determine the critical region for your decision 
Calculate the test statistic X 2 



These steps avc 

you kc-fov-c 


See whether the test statistic is within the 
critical region 

Make your decision 


Look familiar? Most of these steps are exactly the same as for other 
hypothesis tests. In other words, it’s exactly the same process as before. 

therejcire no ^ 

Dumb Questions 


So are x 2 tests really just a special 
kind of hypothesis test? 

Yes, they are. You go through pretty 
much all the steps you had to go through 
before. 

Do I always use the upper tail for 
my test? 

Yes, if you’re conducting a hypothesis 
test, you always use the upper tail. This is 
because the higher the value of your x 2 test 
statistic, the more your observed frequencies 
differ from the expected frequencies. 


I think I’ve heard the term degrees 
of freedom before. Have I? 

Yes, you have. Remember when we 
looked at how we can use the t-distribution 
to create confidence intervals? Well, the 
t-distribution uses degrees of freedom, too. 

I think I’ve seen degrees of freedom 
referred to as df rather than v. Is that 
wrong? 

Not at all. Different text books use 
different conventions, and we’re using v. 

At the end of the day, they have the same 
meaning. 


I want to look for information about 
the x 2 distribution on the Internet. How do 
I find it? Do I need to type in Greek? 

You should be able to find any 
information you need by searching for the 
term “chi square.” The x 2 distribution is also 
written “chi-squared.” 
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E%eftci$e 


It’s your job to see whether there’s sufficient evidence at the 5% level to say that the slot 
machines have been rigged. We’ll guide you through the steps. 


1. What’s the null hypothesis you’re going to test? What’s the alternate hypothesis? 


2. There are 4 degrees of freedom. What’s the region for the 5% level? 


3. What’s the test statistic? 



tt'mt You daltula-tcd t^'is ca\rlic\r. 


4. Is your test statistic inside or outside the critical region? 


5. Will you accept or reject the null hypothesis? 
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exercise solution 



It’s your job to see whether there’s sufficient evidence at the 5% level to say that the slot 
machines have been rigged. We’ll guide you through the steps. 


1. What’s the null hypothesis you’re going to test? What’s the alternate hypothesis? 

W 0 : The slo*t madiVme -follov/ described ^Probability dis*bribu*tioi^ 



- z 

23 


11 


P(>< 二 >0 

o.rn 

o.ooe 

0.000 

o.oot> 

0.001 


ttj ： The slo*t madhmc v/nrmmjs pev- do y\o{, -follov/ *this p\robabili*ty dis*bribu*tioi^ 


2. There are 4 degrees of freedom. What’s the region for the 5% level? 

Fv-om pv-obabiliiy tables, ( 午 ） 二 1 .午卞 This med^s *thai the (tv-i*tidal v-ejio^ is y/hcv-c )< z > 


3. What’s the test statistic? 

The *tcs*t s*ta*tis*tid is )< z . Y^u -foui^d -this ca\rlic\r ； i*ts value is 


Is your test statistic inside or outside the critical region? 

The value o( )< Z is 祕 .2>7, and as ^\ri*tidal v-cyo^ is )< z > 卞午 Uhis mea^s *tha*t )( z is inside *tiic ^v-i*ti^al 
v-ejio^. 


5. Will you accept or reject the null hypothesis? 

The value o( )( Z is mside 匕 \ri*ti 乙 al \rcjio^ so *tlVis medics wc \rcjcd*t "the null hypo-thesis. o*thc\r y/o\rds, 

is su*f*fi 乙 icirrt cvidc^c *to vcjcd*t hypo-thesis *tha*t slo*t ma 匕 iVmc v/nrmmjs -follov/ dcsd\ribcd 
p\robabili*ty dis*bribu 七 ion. 
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the distribution 


YouVg solved the slot machine mystery 

Thanks to your careful use of the X 2 probability distribution, you’ve found out 
that there’s sufficient evidence that the slot machine isn’t following the probability 
distribution that the casino expects it to. Fat Dan is very grateful to you, as this 
means you’ve come up with evidence that the slot machine has been rigged in 
some way. He’s shut them down, so he doesn’t lose any more money. 


Out of Order 






Let’s summarize the steps you went through to discover this. 

First of all, you took a set of observed frequencies for the slot machine and 
calculated what you expected the frequencies to be, assuming they followed a 
particular probability distribution. You then calculated the degrees of freedom 
and calculated the test statistic X 2 , which gave you an indication of the total 
discrepancy between the observed frequencies and those you expected. 

After this, you used the x 2 probability tables to find the critical region of the 
distribution at the 5% level of significance. You checked this against your test 
statistic and found that there was sufficient evidence to say that the slot machine 
has been rigged to pay out more money. 

A 




r 




V 


VU test slat, 叔 WU 如 crSca\ 
^ so 7 ou tould ^otKcs.s 


h ■ - 




X 2 a (v) 

This sort of hypothesis test is called a goodness of fit test. It tests whether 
observed frequencies actually fit in with an assumed probability distribution. 
You use this sort of test whenever you have a set of values that should fit a 
distribution, and you want to test whether the data actually does. 


you are here ► 


579 





























































long exercise 



t°nt E%endSe 


Fat Dan thinks that the dice in the dice games are loaded. Take a look at the following 
observed frequencies for one six-sided die, and test whether there’s enough evidence to 
support the claim that the die isn’t fair at the 1 % significance level. We’ll guide you through the 
steps. 

Here are the observed frequencies: 


Value 

1 

2 

3 

4 

5 

6 

Frequency 

107 

198 

192 

125 

132 

248 


Step 1: Decide on the hypothesis you’re going to test, and its alternative. 


Step 2: Find the expected frequencies and the degrees of freedom. 

Start off by completing the expected frequencies for the die. You'll need to take into account how many times the die is 
thrown in total, and the probability of getting each value. X represents the value of one toss of the die. 


X 

Observed frequency 

Expected frequency 

1 

107 


2 

198 


3 

192 


4 

125 


5 

132 


6 

248 



Once you’ve found the expected frequencies, what are the number of degrees of freedom? 


-Pmd -this -the sa^c way you 
•Pou^d -the degrees o-P -Pv-ccdom -fov 
"the slo*t rnddh'mes. 
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the distribution 


Step 3: Determine the critical region for your decision. 

You'll need to use the significance level and number of degrees of freedom 


Step 4: Calculate the test statistic X 2 . 

You can calculate this using your observed and expected frequencies from step 2. 


Step 5: See whether the test statistic is within the critical region. 


Step 6: Make your decision. 
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long exercise solution 



E%etiaSe 
Sotoiiort 


Fat Dan thinks that the dice in the dice games are loaded. Take a look at the following observed 
frequencies for one six-sided die, and test whether there’s enough evidence to support the claim 
that the die isn’t fair at the 1 % significance level. We’ll guide you through the steps. 

Here are the observed frequencies: 


Value 

1 

2 

3 

4 

5 

6 

Frequency 

107 

198 

192 

125 

132 

248 


Step 1: Decide on the hypothesis you’re going to test, and its alternative. 

To die is -fair, y/c have *to i 匕 ie 灼七 七•七 \ sy \ 

This jives you 

W 0 : The die is cvcv-y value has o-f bemg *th\rovm. This p\robabili*ty o-f 

yttmg value is l/^>. 

ttj ： The die is/ 七 -faiv-. 


Step 2: Find the expected frequencies and the degrees of freedom. 

Start off by completing the expected frequencies for the die. You'll need to take into account how many times the die is 
thrown in total, and the probability of getting each value. X represents the value of one toss of the die. 


X 

Observed frequency 

Expected frequency 

1 

107 


2 

198 


3 

192 


4 

125 


5 

132 


6 

248 

m 


藏 ㈤ 


Once you’ve found the expected frequencies, what are the number of degrees of freedom? 


IVlc had *to -f md t^tc\zd -fv-c^uc^dics, -bo-tal had *to equal lOOZ. o*t^c\r v/ords, y/c had *to -fmd 

picdcs o-f y/i*th I \rcs*t\rid*tioir\. This jives us 

V 二△一 I 
二弓 
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the distribution 


Step 3: Determine the critical region for your decision. 

You'll need to use the significance level and number of degrees of freedom 

F\rom p\robabili*ty tables, 二 1^ 0^. This d\ri*tidal is y/hc\rc )( Z > 1^ 0^}. 


Step 4: Calculate the test statistic X 2 . 

You can calculate this using your observed and expected frequencies from step 2. 

- ( 107 - 1 ^) 7^7 + + (l^in)VI^ + + + (Z^lblWl 

- uiowi + aDvi^i + + (- 午 2J z /iq + + mvi^i 

— (Z^OO + ^1 + + 17厶午 + \2-Vy + ⑹ > 1)/1 厶7 

二 

二 ee.z^r 


Step 5: See whether the test statistic is within the critical region. 

The ^\ri*ti^al \rcjio^ is y\JCY\ by )< z > fis )( z 二 佛 .2 • 午， *tcs*t s*ta*tis*ti^ is d\ri*tidal vegio 灼 . 


Step 6: Make your decision. 

As you\r *tcs*t s*ta*tis*tid lies -the ^vi*ti^al \re3i0 灼 ， *this me3r\s *thcv-c is su*ff i 乙 ieirrt cvidc^c a*t I % level 

*to \rcjcd*t *tiic 灼 ull hypo-thesis. o*thcv- v/ords, you ad^cp*t al*tc\r^a*tc hypo-thesis *tiic die is/ 七 -faiv-. 
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X 2 distribution cheat sheet 



0 


鷗 


入仏 咏 “ 

i h ^ 

^tc\rval 



The x 2 goodness of fit test works for pretty much any 
probability distribution. 

You can use the x 2 distribution to test the goodness of fit of any probability 
distribution, just as long as you have a set of observed frequencies, and you can 
work out what you expect the frequencies to be. 

The hardest thing is working out what the degrees of freedom for v should be. 
Here are the degrees of freedom for some of the most common probability 
distributions you’ll want to use with the X 2 goodness of fit. 


P is Probability o-f su^css, ov 

thc susses i h the 

populatioh. 




Binomial 


Poisson 


Normal 


You know what p is 

You don't know what p is, and 
you have to estimate it fr ⑽ 
the observed frequencies 

You know what K is 

You don't know what A is, and 
you have to estimate it from 
the observed frequencies 
You know what p and a 2 are 

You don't know what p and a 2 
are, and you have to estimate 
them from the observed 
frequencies 


y \ IS 

-total v^umbev* 

ojf obsev-ved 


v = n 


，v = n - 
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Fat Pan has another problem 

So far you’ve investigated whether the slot machines seem to be rigged 
in some way, by using a goodness of fit test to see whether the observed 
frequencies you have correspond to the expected probability distribution. 
Fat Dan has other problems, though, and this time it’s his staff. 

Fat Dan thinks he’s losing more money than he should from one of the 
croupiers on the blackjack tables. Gan you determine whether there’s 
significant evidence to show whether or not Fat Dan’s right? 

Here are the three croupiers who man the tables: 



These 

ou-t^or^s you 
eadh of -the 




Win 43 49 22 


Draw 8 2 5 


Lose 47 44 30 


What we need is some way of testing whether the outcome of the game 
is dependent on which croupier is leading the game. 





What do you need to know in order to test this hypothesis? 
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testing for independence 


the x z distribution caw test for independence 

So far we’ve looked at the x 2 distribution in terms of performing goodness 
of fit tests. This isn’t the only use of the X 2 distribution. The x 2 distribution 
can also be used to perform tests of independence . 

A x 2 test for independence is a test to see whether two factors are 
independent, or whether there seems to be some sort of association 
between them. This is just the situation we have with the croupiers. We 
want to test whether the croupier leading a game of blackjack has any 
impact on the outcome. In other words, we assume that the choice of 
croupier is independent of the outcome, unless there’s sufficient evidence 
against it. 

You conduct a test for independence in the same way you conduct a 
goodness of fit test. You set up a hypothesis, use the observed and expected 
frequencies to calculate the X 2 test statistic, and then see if it falls within the 
critical region. 


Now hold it right there! I think 
you re missing something. How can we 
work out the expected frequencies? 
All we have to go on is the observed 
frequencies the actual game outcomes. 


We need to know what the expected frequencies are 
in order to calculate the test statistic X 2 . 

This means that we need some way of calculating the expected frequencies 
from the observed frequencies. And it all comes down to probability... 
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You can find the expected frequencies using probability 

There are a few steps you need to go through to find the expected 
frequencies. 

To start off, calculate the total frequencies for the outcomes and the croupiers, 
and also the grand total. You can show the results in a table like this, called a 

contingency table. 


-fov~ 

^^oupiev- /\ 



Croupier A 

Croupier B 

Croupier C 

Total 

Win 

43 

49 

22 

114 

Draw 

8 

2 

5 

15 

Lose 

47 

44 

30 

121 

Total 

98 

95 

57 

250 


Now we can use this information to find the expected number of wins for each croupier. 
Let’s start by finding the expected frequency for the number of wins with croupier A. 


Total 於 ， ^ 

^ o( 




First off, we can use these grand totals to find the probability of getting a particular 
outcome, or a particular croupier. As an example, to find the probability of winning, you 
divide the total number of wins by the grand total: 

P(Win) = Total Wins 


Grand Total 

Similarly, you can find the probability of playing against croupier A by 
dividing the total for croupier A by the grand total. 

P(A) = Total A 

Grand Total 


Now if the croupier and the outcome of the game are independent, as we 
assume they are, this means that you can find the probability of getting 
a win with croupier A by multiplying together these two probabilities. In 
other words: 

P(Win and A) = Total Wins x Total A 




Grand Total Grand Total 





How can we use this to find the expected number of wins for croupier A? 
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finding the frequencies 


So what are the frequencies? 

So far, we’ve found that the probability of winning with croupier A, and 
we want to use this to find the expected frequency of wins. To do this, all 
we just need to multiply the probability of winning with croupier A by the 
grand total. This gives us 


Expected frequency = 


Gpjid-^^otal 


x Total Wins x Total A 
Gr^p^ft5fal Grand Total 


=Total Wins x Total A 


Grand Total 


In other words, to find the expected frequency of wins with croupier A, 
multiply the total number of wins by the total number of games with 
croupier A, and divide by the grand total. 

How do we find the frequencies m general? 

You can generalize this so that you have a nice, easy result you can apply 
to every frequency you need to find. To find the expected frequency for a 
particular row and column combination, multiply the total for the row by 
the total for the column, and divide by the grand total. 


Expected frequency = Row Total x Column Total 


Grand Total 


Once you’ve figured out what all the expected frequencies are, you can use 
this to calculate the test statistic X 2 . It’s the same test statistic as before, so 
you need to calculate 


x2= ^(o_E) 2 ^ 


p 0 y- gvcv"Y observed sulcrbrad 七 

s<\uavc vesul 七 , 

divide by 

TV^ add youv- v-csul-U 


The key is to ensure you include every observed frequency and every 
corresponding expected frequency. 
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These 和 

卜 observed 


Here’s the table showing the observed frequencies for the croupiers. Your task is to figure out all 
the expected frequencies. 



Croupier A 

Croupier B 

Croupier C 

Total 

Win 

43 

49 

22 

114 

Draw 

8 

2 

5 

15 

Lose 

47 

44 

30 

121 

Total 

98 

95 

57 

250 


(Row "to"toll % dolur»m "tot31 


oui 

^ of ih C 
c ^icd 

^^UCh^ics - 
hc^rc. 


> 



Croupier A 

Croupier B 

Croupier C 

Win 

(114x98)/250=44.688 



Draw 

(if 98)/250=5.88 



Lose 

()21x98)/250=47.432 




Once you’ve found all the expected frequencies, calculate the test statistic X 2 . Use the table below to help you. The 
first column gives all the observed frequencies, the second column is for the corresponding expected frequencies, 
and if you add together all the numbers in the third column, it gives you your test statistic. 


C 


Observed 

Expected 

(O ■ E ) 2 ^^_ Use values m ^ f irst *Uo 

E columns bo you tabulate 

43 

44.688 

(43-44.688) 2 /44.688 = 2.85/44.688 = 0.064 

8 

5.88 

(8-5.88) 2 /5.88 = 4.4944/5.88 = 0.764 

47 

47.432 

(47-47.432) 2 /47.432 = 0.187/47.432 = 0.004 

49 



2 



44 



22 



5 



30 



ZO = 250 

ZE = 

I ( ° e e)2 = 
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exercise solution 





Here’s the table showing the observed frequencies for the croupiers. Your task is to figure out all 
the expected frequencies. 



Croupier A 

Croupier B 

Croupier C 

Total 

Win 

43 

49 

22 

114 

Draw 

8 

2 

5 

15 

Lose 

47 

44 

30 

121 

Total 

98 

95 

57 

250 





Croupier A 

Croupier B 

Croupier C 

Win 

(114x98)/250=44.688 

⑴怖 )/加 二午 m 

⑴午 雅 

Draw 

(15x98)/250=5.88 



Lose 

(121x98)/250=47.432 

(l2J>c 邗 ) /2^0 二午弓， 

(111^1)/1^11^ 


Once you’ve found all the expected frequencies, calculate the test statistic X 2 . Use the table below to help you. The 
first column gives all the observed frequencies, the second column is for the corresponding expected frequencies, 
and if you add together all the numbers in the third column, it gives you your test statistic. 



C 



Observed 

Expected 

(O ■ E) 2 

E 

43 

44.688 

(43-44.688) 2 /44.688 = 2.85/44.688 = 0.064 

8 

5.88 

(8-5.88) 2 /5.88 = 4.4944/5.88 = 0.764 

47 

47.432 

(47-47.432) 2 /47.432 = 0.187/47.432 = 0.004 

49 


(^)^Z3Z)V^3Z - - O.IZI 

2 


- b .^.7 - z. 午 oz 

44 


(午午-午和) V 午 s 啊二 =1 o.o^ 

22 


- - om 

5 

孓午 z 

$ -孓午 zm 午 z 二 z 邪午浙 2 •二 omo 

30 


(^O-Z7.^0) z /Z7.^« - - O.ZII 

ZO = 250 

ZE = Z^O 

£ (o ■印 2 = ⑽ 


TWis is your test statists 
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the distribution 


Wc still need to calculate degrees of freedom 

Before we can use the x 2 distribution to find the significance of the observed 
frequencies, there’s just one more thing we need to find. We need to find v, the 
number of degrees of freedom. 

You saw earlier that the number of degrees of freedom is the number of pieces 
of independent information we are free to choose, taking into account any 
restrictions. This means that we look at how many expected frequencies we have 
to calculate independently, and subtract the number of restrictions. 


First of all, let’s look at the total number of expected frequencies we had to 
calculate. We had to figure out the expected frequencies for the three croupiers 
and the three possible outcomes. This means that we worked out 3x3 = 9 
expected frequencies. 


K^d "to -pi^u\rc ou*t 

卜 3 二 ” expected 






Croupier A 

Croupier B 

Croupier C 

Win 




Draw 




Lose 





Now for each row and for each column, we only actually had to calculate two of 
the expected frequencies. We knew what the total frequency should be, so we could 
choose the third to make sure that the frequencies added up to the right result. In 
other words, we only actually had to calculate 4 of the expected frequencies; the 
other 5 had to fit in with the total frequencies we already knew about. 


oir>ly had -to ^aldula-tc -these 

亡 ould -Pigu\rc out the othev-s —aS 
usihg 七 he -Pv-c^uchdy o( 

mow ahd dolumh. 




Croupier A 

Croupier B 

Croupier C 

Win 




Draw 




Lose 






£>ould 

i\\t last a^d 
out us » 吒 

VtaU. 


Since we had to calculate 4 expected frequencies, this makes the number of 
degrees of freedom. There were 4 pieces of independent information we had 
to calculate; once we’d done that, the rest were known automatically. In other 
words, v = 4. 

Another way of looking at this is that we needed to find 9 values overall, and 
there were 5 values that we didn’t have to calculate independently. Using our 
formula from before, this gives us v = 9 — 5 = 4. 
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another long exercise 



t°nt E%endSe 


Conduct a hypothesis test with a 1 % significance level to see whether the outcome of the 
game is independent of the croupier manning the table. Here’s a reminder of the steps, but 
remember you’ve worked out some of these already. 


1. Decide on the hypothesis you’re going to test, and its alternative. 

2. Find the expected frequencies and the degrees of freedom. 

3. Determine the critical region for your decision. 

4. Calculate the test statistic X 2 . 


5. See whether the test statistic is within the critical region. 

6. Make your decision. 
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the distribution 


^ you lo-b 
^ y°^ ^uhiior>s. 


spa^c 
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and another long exercise solution 




E%efiaSe 
Sotoiiort 


Conduct a hypothesis test with a 1 % significance level to see whether the outcome of the 
game is independent of the croupier manning the table. Here’s a reminder of the steps, but 
remember you’ve worked out some of these already. 

1. Decide on the hypothesis you’re going to test, and its alternative. 

2. Find the expected frequencies and the degrees of freedom. 

3. Determine the critical region for your decision. 

4. Calculate the test statistic X 2 . 

5. See whether the test statistic is within the critical region. 

6. Make your decision. 

Step I ： 

IVlc *to whcthcv- -the ou*tdomc o-f -the is dv-oupiev- "table. This 

medics y/c use: 

W q ' Thc\rc is y\o \rda*tioir\ship between ou*tdomc o-f -the d\roupic\r *tablc- 

Thc\rc is a \rda*tioir\ship between outcome o( d\roufic\r -the *tdble 


Step Z ： 


IVlc -foui^d t^tc\zA -fvc^uc^ics i 

dcjvccs o-f -fv-ccdom is 


•m -the c>^c\r^isc bd^k oy\ ^O, d^d y/c^vc jus*t seen -that i^umbcv- 


Step Z ： 

From p\robabili*ty *tablcs ; ( 午 ） 二 1 ^- 2 -^* This med^s *tiic d\ri*tidal vcjio^ is ^\\jcy\ by )< z > 


S 七 cp 午： 

IVlc also ^akula*tcd *tiic *tcs*t s*ta*tis*tid )< z us*mg -the -fvc^uc^ics ba^k oy\ pajc ^O. We. -fou^d -that 

>< z - ^.00^. 


Step 弓： 

The d\ri*tidal v-ejio^ is jivcir\ by )( Z > IZ Z^, so *this medics )( Z is outside *thc d\ri*tidal vcjio^. 

S*tcp 厶： 

As )( z is outside dvi*ti 匕 al \rc^io^, y/c *tiic null hypo-tiicsis. Thc\rc is cvidc^c *tha*t ■tiicv-c^s 

d \reld*ticmsiVip bctv/CC^ ou*tdomc ^\roufic\r. 
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the distribution 



I’m still not sure I understand how 
you found the degrees of freedom for the 
croupiers. Why are there four degrees of 
freedom? 

We found the degrees of freedom by 
looking at how many expected frequencies 
we had to calculate, and working out how 
many of these we could have calculated by 
just looking at the total observed frequencies 
for each row and column. 


Are there any other uses of the x 2 
distribution besides testing goodness of 
fit and independence? 

These are the two main uses of the 
X 2 distribution. The thing to remember is 
that you can use it to test the goodness of 
fit of virtually any probability distribution. As 
an example, you can use it to test whether 
observed frequencies fit a particular binomial 
distribution. 


Should I test at any particular 

level? 

It depends on your situation. Just as 
with other hypothesis tests, the smaller the 
level of significance, the stronger you need 
your evidence to be before you reject your 
null hypothesis. 

Testing at the 5% and 1% level of 
significance is common. 


There are three croupiers and three 
outcomes. If you use a contingency table to 
calculate these, the row and column totals 
for the expected frequencies must match 
those of the observed frequencies. This 
means that once you’ve calculated the first 2 
expected frequencies for any row or column, 
the final one is determined by the overall 
total. Therefore, you only need to calculate 
2x2 expected frequencies from scratch. This 
gives you your four degrees of freedom. 




Take a look at how we calculated the degrees 
of freedom for a 3x3 table. How do you think 
we could generalize this? See if you can work 
this out, then turn the page. 
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calculating degrees of freedom in general 


&eneralizmg the degrees of freedom 

So far we’ve looked at the degrees of freedom for a 3x3 contingency table, 
but how do we generalize the result? 

Imagine you’re comparing two variables, and you have h rows of one variable 
and k columns of another. You know what the row and column totals should 
be. Now imagine you want to find the number of degrees of freedom. 



Column 1 

■ ■ ■ 

Column k-1 

Column k 

Row 1 





■ ■ ■ 





Row h-1 





Row h 






For each row, there are k columns. You know what the total of each row 
should be, so you only actually need to calculate the expected frequency of 
(k — 1) of the columns. You automatically know what the Mi column is 
because you know the total frequency of the row. 


y 0 u 

ou-t k US— 

^ v*o>w 


It’s a similar process for the columns. Each column has h rows, and you 
know what the total of each column should be. This means that you have 
to calculate {h — 1) of the rows for each column. You automatically know 
what the value of the h\h row is because you know the total frequency of the 
column. 


^ v/ou "to tabulate 
r {\\cst K-l v-oy/s* 

I V^>u C^Yi -Pi0u\rc ou 七 vow li 
us, ^9 the dolurrm 



Column 1 

Row 1 


■ ■ ■ 


Row h-1 


Row h 




Column 1 

■ ■ ■ 

Column k-1 

Column k 

Row 1 






/ou heed -to ^Idula-tc -these. 
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the distribution 


And the formula is... 

If we put this together, the total number of expected frequencies 
you have to calculate is (A: - 1) x {h - 1). In other words, if you have a 
table with dimensions h by k, you can find the degrees of freedom by 
calculating 

v = (h-1)x(k-1) 



Column 1 

■ ■ ■ 

Column k-1 

Column k 

Row 1 





■ ■ ■ 





Row h-1 





Row h 












You Kavc *to tabulate ^ 一 I) 乂 （ k 一 I) 

so tV>cv-c avc 

(Vv-I) % (M) dtyrtts <A freedom. 


^harpen your pencil 


Fat Dan has hired two more croupiers. What are the degrees of 
freedom now? The outcomes of the game remain the same. 
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sharpen solution 

irpen your pencil 
Solution 



Fat Dan has hired two more croupiers. What are the degrees of 
freedom now? The outcomes of the game remain the same. 


As Fa*t Vsr\ has hiv-cd Z more d\roupic\rs, -this v/c r\ov/ have 3 table. 


㈣ 二 




Cv-oupic\r /\ 

Cv-oupic\r B 

Cv-oupic\r C 

C\roufic\r D 

Cv-oupic\r E 

iVm 






D\rav/ 






Lose 







The ^umbev- <^f dcjv-ccs <^f -fvccdom is jivci^ by (h—I) % (k 一 I), y/hc\rc \\ is -the i^umbcv- c^f vows, d^d k is *thc ir\umbc\r 
o( dolum^s. This 3 ivcs us 


V 二 2> X 午 
二« 


BULLET POINT$ - 

■ The x 2 distribution allows you to conduct goodness 
of fit tests and test independence between 
variables. 

■ It takes a test statistic 

X 2 = Z7 

where 0 refers to observed frequencies, and E 
refers to expected frequencies. 


■ If we’re using test statistic X 2 with the x 2 
distribution, we write 

X 2 ~X 2 a (v) 

where v is the number of degrees of freedom, and 
a is the level of significance. 

■ In a goodness of fit test, v is the number of 
classes minus the number of restrictions. 

■ In a test for independence for two variables, if your 
contingency table has h rows and k columns, 

v = (h — 1)x(k—1) 
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the distribution 


YouVg saved the casino 

Thanks to your mastery of the yj distribution, you’ve managed to 
unearth which of the casino games look like they’ve been rigged. You 
discerned explainable discrepancies between what you got and what you 
expected, and you also detected suspicious activity at certain levels of 
significance. 


Fat Dan is delighted with your efforts. Thanks to you, he knows which of 
his casino games need to be investigated, and the blackjack croupiers get 
to keep their jobs. Next time you’re in town, tell Fat Dan — he’ll supply 
you with extra chips, all on the house. 





^3 


1 



Odds! J 
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chapter-ending long exercise 



t°nt ExeticiSe 


Fat Dan thinks that one or more of his croupiers are somehow influencing the results of the roulette 
wheel. Here’s data showing the observed frequency with which the ball lands in each color pocket 
for each of the croupiers. Conduct a test at the 5% level to see whether pocket color and croupier 
are independent, or whether there is sufficient evidence to show there might be something going on. 



Croupier A 

Croupier B 

Croupier C 

Red 

375 

367 

357 

Black 

379 

336 

362 

Green 

46 

37 

41 


Step 1: Decide on the hypothesis you’re going to test, and its alternative. 


Step 2: Find the expected frequencies and the degrees of freedom. Use the table of expected frequencies below. 


广 ^^ Complete the iroY/ air>d dolurwr> -fco-tals -fiv-s-t these ^rt 

/ 七 ^ same as (or the observed above- 



Croupier A 

Croupier B 

Croupier C 

Total 

Red 

1099x800/2300=382.3 

1099x740/2300=353.6 



Black 

1077x800/2300=374.6 




Green 

124x800/2300=43.1 




Total 

800 





Step 3: Determine the critical region for your decision. 
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the distribution 


Step 4: Calculate the test statistic X 2 . Use the table below to help you. 




C 



Observed 

Expected 

(O ■ E) 2 

E 

375 

382.3 

(375-382.3) 2 /382.3 = 53.29/382.3 = 0.139 

379 

374.6 

(379-374.6) 2 /374.6 = 19.36/374.6 = 0.005 

46 

43.1 

(46-43.1) 2 /43.1 = 8.41/43.1 = 0.195 

367 

353.6 

(367-353.6) 2 /353.6 = 179.56/353.6 = 0.508 

336 



37 



357 



362 



41 



ZO = 

ZE = 

7(o-e> 2 = 



L. E 


Step 5: See whether the test statistic is within the critical region. 


Step 6: Make your decision. 
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chapter-ending long exercise solution 



t°n% ExefiaSe 
Sotoitort 


Fat Dan thinks that one or more of his croupiers are somehow influencing the results of the roulette 
wheel. Here’s data showing the observed frequency with which the ball lands in each color pocket 
for each of the croupiers. Conduct a test at the 5% level to see whether pocket color and croupier 
are independent, or whether there is sufficient evidence to show there might be something going on. 



Croupier A 

Croupier B 

Croupier C 

Red 

375 

367 

357 

Black 

379 

336 

362 

Green 

46 

37 

41 


Step 1: Decide on the hypothesis you’re going to test, and its alternative. 

You y/air\*t *to *tcs*t v/hc*thc\r ov y\o{, podkrt dolo\r is d\roufic\r. This jives 

W 0 : Roulette v/hccl podke 七 dolor d\roupic\r a\rc 
ttj ： Podkc*t 匕 olo\r d\roufic\r a\rc y\o{, 


Step 2: Find the expected frequencies and the degrees of freedom. Use the table of expected frequencies below. 


You -f'md by multiplymj cadh \ro>/ dolurrm •bo-tal, dividm^ by *to*tal. 



Croupier A 

Croupier B 

Croupier C 

Total 

Red 

1099x800/2300=382.3 

1099x740/2300=353.6 

%Ho/Xioo 二 W>.l 

IOf} 

Black 

1077x800/2300=374.6 

1 077>T/ 午 0/2300 二 3 午 M 

1 omo/uoo 二狹 

loll 

Green 

124x800/2300=43.1 

1 外 7 午 0/2300 二 3” 

n^i^omoo^i.o 

12 •午 

Total 

800 

7 午 O 

li>0 

2300 


Thc\rc a\rc 3 doluwms 3 vov/s, we -fmd -the i^umbcv- o-f dcjv-ccs <^f -fvccdom by multiplymj •bojethev (^umbev- 
vows — I) (^umbc\r o-f doluwms — I). This gives us 

V 二 Z%Z 

二午 

Step 3: Determine the critical region for your decision. 

Fv-om pv-obabiliiy tables, x z 5% ( 午 ） 二 1 .午卞 This i\\ai ihc dv-i*tidal v-ejio^ is by )< z > °[A° {- 
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the distribution 


Step 4: Calculate the test statistic X 2 . Use the table below to help you. 




C 



Observed 

Expected 

(O ■ E) 2 



E 

375 

382.3 

(375-382.3) 2 /382.3 = 53.29/382.3 = 0.139 

379 

374.6 

(379-374.6) 2 /374.6 = 19.36/374.6 = 0.005 

46 

43.1 

(46-43.1) 2 /43.1 = 8.41/43.1 = 0.195 

367 

353.6 

(367-353.6) 2 /353.6 = 179.56/353.6 = 0.508 

336 


- uo.iwz^ - om 

37 

m 

G7 - 州 ) vm 二 0 . 午 i/m 二 o.ZII 

357 


- nzi/w>.i 二 o.ioz 

362 

刪 

- nzi / 狹 .，二 o.io^ 

41 

午 1.0 

( 午 1 - 午 l)V 午 1 二 0/ 午 1 二 o 

ZO = 2300 

ZE = 2300 

T(o-E ) 2 = ,.^ 


This *tcs*t s*ta*tistid is o ^\\ jcy \ by )< z — I 


Step 5: See whether the test statistic is within the critical region. 

The d\ri*tidal \rcjioir\ is by )< z > 卞午 ® . )( Z 二 I *tcs 七 s*ta*tis*tid is ou*Uidc -the d\ri*tidal \rcjioir\. 


Step 6: Make your decision. 

As you\r *tcs*t s*ta*tis*tid lies outside *tiic d\ri*tidal rejicm, *this is msuHi 乙 ’wt cvidc^c a*t 

level *to \rcjcd*t *tiic null hypo-tiicsis. o*thc\r y/ov-ds, you null hypo-thesis *tha*t podke 七 dolo\r ^\roupic\r 

a\rc mdcpc^dc^t 
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15 correlcitlon and regression 


^ WhaVs My Line? + 



Have you ever wondered how two things are connected? 

So far we’ve looked at statistics that tell you about just one variable — like men’s height, 
points scored by basketball players, or how long gumball flavor lasts — but there are other 
statistics that tell you about the connection between variables. Seeing how things are 
connected can give you a lot of information about the real world, information that you can 
use to your advantage. Stay with us while we show you the key to spotting connections: 
correlation and regression. 


this is a new chapter 
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the sunshine-attendance connection 









V 




Never trust the weather 


Today’s concert looks like it will be one of their best ones ever. 
The band has just started rehearsing, but there’s a cloud on the 
horizon... 


Concerts are best when they’re in the open air — at least that’s 
what these groovy guys think. They have a thriving business 
organizing open-air concerts, and ticket sales for the summer 
look promising. 


Before too long the sky’s overcast, temperatures are dipping, and 
it looks like rain. Even worse, ticket sales are hit. The guys are in 
trouble, and they can’t afford for this to happen again. 

What the guys want is to be able to predict what concert attendance 
will be given predicted hours of sunshine. That way, they’ll be able to 
gauge the impact an overcast day is likely to have on attendance. If 
it looks like attendance will fall below 3,500 people, the point where 
ticket sales won’t cover expenses, then they’ll cancel the concert 

They need your help. 
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correlation and regression 


Lcfs analyze sunshine and attendance 

Here’s sample data showing the predicted hours of sunshine and concert 
attendance for different events. How can we use this to estimate ticket 
sales based on the predicted hours of sunshine for the day? 


Sunshine (hours) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

Concert attendance (100’s) 

22 

33 

30 

42 

38 

49 

42 

55 


Thafs easy. We can find the mean 
and standard deviation and look at the 
distribution. That will tell us everything. 


Most of the time ， that’s exactly the sort of thing we’d 
need to do to predict likely outcomes. 

The problem this time is, what would we find the mean and standard 
deviation of? Would we use the concert attendance as the basis for our 
calculations, or would we use the hours of sunshine? Neither one of them 
gives us all the information that we need. Instead of considering just one set of 
data, we need to look at both. 

So far we’ve looked at independent random variables, but not ones that are 
dependent. We can assume that if the weather is poor, the probability of high 
attendance at an open air concert will be lower than if the weather is sunny. 
But how do we model this connection, and how do we use this to predict 
attendance based on hours of sunshine? 

It all comes down to the type of data. 





How would you go about modelling the connection 
between sets of data? 
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introducing bivariate data 


Open Air Concert Attendance 



Wy.wav'»a*tc data ^ov coutri 
^4 tclU you 

about V^ouvs su^mc. 


- > 

Attendance 

So what if we do need to know what the connection is between variables? While 
univariate data can’t give us this information, there’s another type of data that 

can — bivariate data. 

All about bivariate data 

Bivariate data gives you the value of two variables for each observation, not 
just one. As an example, it can give you both the predicted hours of sunshine and 
the concert attendance for a single event or observation, like this. 


Sunshine (hours) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

Concert attendance (100’s) 

22 

33 

30 

42 

38 

49 

42 

55 


If one of the variables has been controlled in some way or is used to explain the 
other, it is called the independent or explanatory variable. The other variable 
is called the dependent or response variable. In our example, we want to use 
sunshine to predict attendance, so sunshine is the independent variable, and 
attendance is the dependent. 


BivaHatc dais Jives you -the 
value o^ -two variables 4, 
obscirva-tioh. 


Exploring types of data 

Up until now, the sort of data we’ve been dealing with has been univariate. 

Univariate data concerns the frequency or probability of a single variable. As an 
example, univariate data could describe the winnings at a casino or the weights of 
brides in Statsville. In each case, just one thing is being described. 

What univariate data can’t do is show you connections between sets of data. For 
example, if you had univariate data describing the attendance figures at an open air 
concert, it wouldn’t tell you anything about the predicted hours of sunshine on that 
day. It would just give you figures for concert attendance. 


A - 

AouonaraJIL 


608 Chapter 15 















correlation and regression 


niicy\day\tc is 

° h "the y - axis. 


Concert Attendance and Skmsliine 

x 


x 


x 


x 


x 

x X (^ 

TV^csc avc 


all data 


X 




) 


2 3 4 5 6 7 8 

sunshine (hours) 


Suir\sV)'mC is 


Can you see how the scatter diagram helps you visualize patterns m the data? 
Can you see how this might help us to define the connection between open air 
concert attendance and predicted number of hours sunshine for the day? 


Just as with univariate data, you can draw charts for bivariate data to 
help you see patterns. Instead of plotting a value against its frequency or 
probability, you plot one variable on the x-axis and the other variable against 
it on the y-axis. This helps you to visualize the connection between the two 
variables. 

This sort of chart is called a scatter diagram or scatter plot : and 
drawing one of these is a lot like drawing any other sort of chart. 

Start off by drawing two axes, one vertical and one horizontal. Use the 
x-axis for one variable and the y-axis for the other. The independent variable 
normally goes along the x-axis, leaving the dependent variable to go on 
the y-axis. Once you’ve drawn your axes, you then take the values for each 
observation and plot them on the scatter plot. 

Here’s a scatter plot showing the number of hours of sunshine and concert 
attendance figures for particular events or observations. As the predicted 
number of hours sunshine is the independent variable, we’ve plotted it on 
the x-axis. The concert attendance is the dependent variable, so that’s on the 
y-axis. 

Wou,rs s uhshihc goes 

你 ihe x-axis ; 

oh ih e 

y — axis. 


Visualizing bivariate data 


^ rteve S data- 


or\ "the 


x (sunshine) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

y (attendance) 

22 

33 

30 

42 

38 

49 

42 

55 


o o o o o o 

6 5 4 3 2 1 

{SOOT - )oouepuase 
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sharpen your pencil 


We know we haven t shown you how to analyze bivariate data yet, 
but see how far you get in analyzing the scatter diagram for the 
concert organizers. 

What sort of patterns do you see in the chart? How can you relate 
this to the underlying data? What do you expect open air concert 
attendance to be like if it's sunny? What about if it's overcast? 


Concert Attendance and Skmshine 


x 


x 


x 


x 


x 


x 


x 


x 


2 3 4 5 6 7 8 

sunshine (hours) 
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correlation and regression 


The Case of the High Sunscreen Sales 

An intern at a sunscreen manufacturer has been given the task of 
looking at sunscreen sales in order to see how they can best market 
their particular brand. 

He’s been given a pile of generated scatter diagrams that model 
sunscreen sales against various other factors. He’s been asked to pull 
out ones where there seems to be some relationship between the 
two factors on the diagram, as this will help the sales team. 

M The first diagram that the intern finds plots sunscreen sales for 
jm the day against pollen count. He’s surprised to see that when 
〆 there’s a high pollen count, sales of sunscreen are significantly 
higher, and he decides to tell the sales team that they need to think 
about using pollen count in their advertising. 

When the sales team hears his suggestion, they look at him blankly. 
What do you think the sales team should do? 

Does a high pollen count make people buy sunscreen? 
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Concert Attendance and Simshine 


x 


x 


x 


x 


x 


x 


x 




x 


2 3 4 5 6 7 8 

sunshine (hours) 


Fi\rs*t o-f dll ； dhavt shows 七 1 ^七 dd'td po*m*b a\rc dlus-tc\rcd d\rou^d d s*t\raijh*t I’mc cm dha\rt> BY\d *this 
[\Y\t slopes upwards. |*t looks like, i-f p\rcdid*tcd i^urwbcv- o-f hou\rs o( su^sh'me'm 3 dsy is \rda*tivcly low, ⑶ *thc 
ConCcY'i dttchda 的 de is low -boo- |-f *tiic ir\ur»\bc\r o-f hours su^sh'me is W\^\), *thc^ v/c daw c>^pcd*t dcmdc\r*t d{Azr\A^r\tt *to 
be W\o^\\ *fcoo. This bdsiddlly *thc suirmie\r *thc v/ca*thc\r, *thc mov-c people you 匕 a 的 e 乂 pe 匕七 *to jo *to *thc open 

3i\r toY\tt^r{,- 

Ov\t ■bhaVs impo\rta^*t *to Y\o{,t is v/c daw oi^ly be doir\-fidc^*t about sayrnj -this y/i*th*m *thc va^gc o-f *thc 

dd*td- IVc have y\o dd'bd *to say *tiic pa*t*tc\nr\ is like i-f *thc ^umbev- o-f houv-s o( suir\sh*mc is belov/ Z hou\rs ov- 
above 7 .弓 houvs. 


Scatter diagrams show you patterns 

As you can see, scatter diagrams are useful because they show the 
actual pattern of the data. They enable you to more clearly visualize 
what connection there is between two variables, if indeed there’s any 
connection at all. 

The scatter diagram for the concert data shows a distinct pattern- 
the data points are clustered along a straight line. We call this a 

correlation. 


sharpen solution 


(^Jterpen your pencil 

Solution 


We know we haven’t shown you how to analyze bivariate data yet, 
but see how you get on with analyzing the scatter diagram for 
the concert organizers. 

What sort of patterns do you see in the chart? How can you relate 
this to the underlying data? What do you expect open air concert 
attendance to be like if it’s sunny? What about if it's overcast? 


o o o o o o 
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correlation and regression 


Linear Carrelatians Up CjaSe 


Scatter diagrams show the correlation between pairs of values. 



Correlations are mathematical relationships between variables. You can 
identify correlations on a scatter diagram by the distinct patterns they form. 
The correlation is said to be linear if the scatter diagram shows the points 
lying in an approximately straight line. 


Let’s take a look at a few common types of correlation between two variables: 



Positive linear correlation 

Positive linear correlation is when low values on the x-axis 
correspond to low values on the y-axis, and higher values of 
x correspond to higher values of y. In other words, y tends to 
increase as x increases. 


Negative linear correlation 

Negative linear correlation is when low values on the x-axis 
correspond to high values on the y-axis, and higher values of 
x correspond to lower values of y. In other words, y tends to 
decrease as x increases. 


A 


x 

X 

X 
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No correlation 


X X 

X 


X 
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X 


X 


X 
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If the values of x and y form a random pattern, then 
we say there’s no correlation. 
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difference between correlation and causation 


Coffee shops vS. record shops 
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No. coffee shops 


Correlation vs. causation 



o o 


So if there's a correlation, 
does that mean one of the 
variables caused the value 
of the other? 


A correlation between two variables doesn’t necessarily 
mean that one caused the other or that they’re actually 
related in real life. 

A correlation between two variables means that there’s some sort of 
mathematical relationship between the two. This means that when we 
plot the values on a chart, we can see a pattern and make predictions about 
what the missing values might be. What we don’t know is whether there’s an 
actual relationship between the two variables, and we certainly don’t know 
whether one caused the other, or if there’s some other factor at work. 

As an example, suppose you gather data and find that over time, the number 
of coffee shops in a particular town increases, while the number of record 
shops decreases. While this may be true, we can’t say that there is a real-life 
relationship between the number of coffee shops and the number of record 
shops. In other words, we can’t say that the increase in coffee shops caused the 
decline in the record shops. What we can say is that as the number of coffee 
shops increases, the number of record shops decreases. 


A - 
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correlation and regression 


Solved: The Case of the High Sunscreen Sales 

Does a high pollen count make people buy sunscreen? 

One of the sales team members walks over to the intern. 

“Thanks for the idea,” she says, U but we’re not going 
to use it in our advertising. You see, the high pollen 
count doesn’t make people buy more sunscreen.” 

The intern looks at her, confused. “But it’s all here on 
this scatter diagram. As pollen count increases, so do 
sunscreen sales.’’ 

“That’s true,” says the salesperson, “but that doesn’t mean that the high 
pollen count has caused the high sales. The days when the pollen count 
is high are generally days when the weather is sunny, so people are going 
outside more. They’re buying more sunscreen because they’re spending 
the day outside.” 
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no dumb questions 



thereiare no o 

Dumb Questi9ns 


So are we saying that the predicted sunshine causes low 
ticket sales? 

The bivariate data shows that there is a mathematical 
relationship between the two variables, but we can’t use it to 
demonstrate cause and effect. It’s intuitively possible that more 
people will go to open air concerts when it’s sunny, but we can’t say 
for certain that sunshine causes this. We’d need to do more research, 
as there may be other factors. 

Other factors? Like what? 

One example would be the popularity of the artist performing. If 
a well-known artist is holding a concert, then fans may want to go to 
the concert no matter what the weather. Similarly, an unpopular artist 
is unlikely to have the same dedication from fans. 


Do scatter diagrams use populations or samples of data? 

They can use either. A lot of the time, you’ll actually be using 
samples, but the process of plotting a scatter diagram is the same 
irrespective of whether you have a sample ora population. 

If there’s a correlation between two variables, does it have 
to be linear? 

Correlation measures linear relationships, but not all 
relationships are linear. As an example, a strong relationship 
between two variables could be a distinctive curve, such as y = x 2 . In 
this chapter, we’re only going to be dealing with linear relationships, 
though. 
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We weed to predict the concert attendance 


But hold on, man! How can we 
predict concert attendance 
based on predicted sunshine? 

If the concert attendance 
drops below 3,500, well have to 
bail out, and thafd be a burn. 


So far we’ve looked at what bivariate data is, and how scatter diagrams 
can show whether there’s a mathematical relationship between the two 
variables. What we haven’t looked at yet is how we can use this to make 
predictions. 

What we need to do next is see how we can use the data to make 
predictions for concert attendance, based on predicted hours of sunshine. 





How do you think we could go about making predictions like this for bivariate 
data? 
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line of best fit 


Predict values with a line of best fit 

So far you’ve seen how scatter diagrams can help you see whether there’s a 
correlation between values, by showing you if there’s some sort of pattern. 
But how can you use this to predict concert attendance, based on the 
predicted amount of sunshine? How would you use your existing scatter 
diagram to predict the concert attendance if you know how many hours of 
sunshine are expected for the day? 

One way of doing this is to draw a straight line through the points on the 
scatter diagram, making it fit the points as closely as possible. You won’t be 
able to get the straight line to go through every point, but if there’s a linear 
correlation, you should be able to make sure every point is reasonably close 
to the line you draw. Doing this means that you can read off an estimate 
for the concert attendance based on the predicted amount of sunshine. 




心 t 



sunshine (hours) 


The line that best fits the data points is called the line of best fit. 


A line of best fit? And 
you just guess what the line is 
based on what looks good to 
you? Thafs hardly scientific. 


Drawing the line in this way is just a best guess. 

The trouble with drawing a line in this way is that it’s an estimate, so 
any predictions you make on the basis of it can be suspect. You have 
no precise way of measuring whether it’s really the best fitting line. It’s 
subjective, and the quality of the line’s fit depends on your judgment. 
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correlation and regression 


Your best guess is still a guess 

Imagine if you asked three different people to draw what each of them 
think is the line of best fit for the open air concert data. It’s quite likely 
that each person would come up with a slightly different line of best fit, 
like this: 


sunshine (hours) 


All three lines could conceivably be a line of best fit for the data, but 
what we can’t tell is which one’s really best. 

What we really need is some alternative to drawing the line of best 
fit by eye. Instead of guessing what the line should be, it will be more 
reliable if we had a mathematical or statistical way of using the data 
we have available to find the line that fits best. 


We need to find the equation of the line 

The equation for a straight line takes the form y = a + bx, where 
a is the point where the line crosses the y-axis, and b is the slope 
of the line. This means that we can write the line of best fit in the 
form y = a + bx. 

In our case, we’re using x to represent the predicted number of 
hours of sunshine, and y to represent the corresponding open 
air concert figures. If we can use the concert attendance data to 
somehow find the most suitable values of a and b, we’ll have a 
reliable way to find the equation of the line, and a more reliable 
way of predicting concert attendance based on predicted hour of 
sunshine. 



f 二 a + W IS 七^ 
a av>d b avc 

y = a + bx 
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line of best fit and sum of squared errors 


Wc need to minimize the errors 


Let’s take a look at what we need from the line of best fit, y = a + bx. 

The best fitting line is the one that most accurately predicts the true values of 
all the points. This means that for each known value of x, we need each of 
the y variables in the data set to be as close as possible to what we’d estimate 
them to be using the line of best fit. In other words, given a certain number 
of hours sunshine, we want our estimates for open air concert attendance to 
be as close as possible to the actual values. 

The line of best fit is the line y = a + bx that minimizes the distances between 
the actual observations of y and what we estimate those values of y to be for 
each corresponding value of x. 



y = a + bx 




Let’s represent each of the y values in our data set using y., and its 

• . . A 1 

estimate using the line of best fit as y.. This is the same notation that 
we used for point estimators in previous chapters, as the A symbol 
indicates estimates. 

We want to minimize the total distance between each actual value of y 
and our estimate of it based on the line of best fit. In other words, we 

need to minimize the total differences between y. and y.. We could try 
doing this by minimizing 

2 (Yi - Yi) 

but the problem with this is that all of the distances will actually cancel 
each other out. We need to take a slightly different approach, and it’s 
one that we’ve seen before. 


y 


X 
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correlation and regression 


Introducmg the sum of squared errors 

Gan you remember when we first derived the variance? We wanted to look 
at the total distance between sets of values and the mean, but the total 
distances cancelled each other out. To get around this, we added together 
all the distances squared instead to ensure that all values were positive. 

We have a similar situation here. Instead of looking at the total distance 
between the actual and expected points, we need to add together the 
distances squared. That way, we make sure that all the values are positive. 

The total sum of the distances squared is called the sum of squared 
errors, or SSE. It’s given by: 


The sunr» of slaved 


CV-\ro\rs 


A 


^SSE = Z(y - y) 


2 Uc dbes 七 心七 


In other words, we take each value of y, subtract the predicted value of y 
from the line of best fit, square it, and then add all the results together. 


The SSE reminds me of the variance. 

The variance uses squared distances from 
the mean, and the SSE uses squared 
distances from the line. 


The variance and SSE are calculated in similar ways. 

The SSE isn’t the variance, but it does deal with the distance squared between 
two particular points. It gives the total of the distances squared between the 
actual value of y and what we predict the value of y to be, based on the line 
of best fit. 

What we need to do now is use the data to find the values of a and b that 
minimize the SSE, based on the line y = a + bx. 
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calculating b for the line of best fit 


Find the equation for the line of best fit 

We’ve said that we want to minimize the sum of squared errors, 2(y - y) 2 , 
where y = a + bx. By doing this, we’ll be able to find optimal values for a 
and b, and that will give us the equation for the line of best fit. 

Let's start with b 

The value of b for the line y = a + bx gives us the slope, or steepness, of 
the line. In other words, b is the slope for the line of best fit. 


We’re not going to show you the proof for this, but the value of b that 
minimizes the SSE 2(y - y) 2 is given by 


^t\\ value m'mus 
卞 values, multiplied by ( 


^ - value o-f Y ， w^'rnus i\\t w'cayv 


b = Z((x _ x)(y - y)) 


so [^e -the v-csult UCS Sy)d 




The calculation looks tricky at first, but it’s not that 
difficult with practice. 


First of all, find x and y, the means of the x and y values for the data that 
you have. Once you’ve done that, calculate (x - x) multiplied by (y - y) for 
every observation in your data set, and add the results together. Finally, 
divide the whole lot by 2(x - x) 2 . This last part of the equation is very 
similar to how you calculate the variance of a sample. The only difference 
is that you don’t divide by (n - 1). You can also get software packages that 
work all of this out for you. 

Let’s take a look at how you use this in practice. 



^elax 


If you need to calculate this in 
an exam，you will almost 
certainly be given the formula. 


This means that you won’t have to memorize 
the formula, just know how to use it. 
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Finding the slope for the \m of best fit 

Let’s see if we can use this to find the slope of the line y = a + bx for 
the concert data. First of all, here’s a reminder of the data: 


x (sunshine) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

y (attendance) 

22 

33 

30 

42 

38 

49 

42 

55 


Let’s start by finding the values of x and y, the sample means of the x 
and y values. We calculate these in exactly the same way as before, so 


x = (1.9 + 2.5 + 3.2 + 3.8 + 4.7 + 5.5 + 5.9 + 7.2)/8 
= 34.7/8 
= 4.3375 

7 = (22 + 33 + 30 + 42 + 38 + 49 + 42 + 55)/8 ^ 





0 Jf % -bo 


% 


= 311/8 
= 38.875 


Now that we’ve found x and y, we can use them to help us find the 
value of b using the formula on the opposite page. 


We use x and y to help us find b 


The first part of the formula is 2(x - x)(y - y). To find this, we take the x 
and y values for each observation, subtract x from the x value, subtract 
y from the y value, and then multiply the two together. Once we’ve 


done this for every observation, we then add the whole lot up together. 5 C « 


u 

T) 


2(x -i)(y-y) = (1.9 - 4.3375)(22 - 38.75) + (2.5 - 4.3375)(33 - 38.75) + (3.2 - 4.3375)(30 - 38.75) + 
_ (3^- 4^37jV42 - 38.75) + (4.7 - 4.3375)(38 - 38.75) + (5.5 - 4.3375)(49 - 38.75) + 

一 - y) - 4.3375)(42 -157^)+ (7.2 - 4.3375)(55 - 38.75) 


Add -tKcsc 


=(-2.4375)(-16.75) + (-1.8375)(-5.875) + (-1.1375)(-8.875) + (-0.5375)(3.125) + (0.3625)(-0.875) + 
(1.1625)(10.125) + (1.5625)(3.125) + (2.8625)(16.125) 

= 40.828125 + 10.7953125 + 10.0953125 -1.6796875 -0.3171875 + 11.7703125 + 4.8828125 + 
46.1578125 


= 122.53 (to 2 decimal places) 
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calculating b for the line of best fit, part deux 


Finding the slope for the \m of best fit part ii 

Here’s a reminder of the data for concert attendance and predicted hours of sunshine: 


x (sunshine) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

y (attendance) 

22 

33 

30 

42 

38 

49 

42 

55 



Heve’s a vew'^dev- 

b = Z(x ■ x)(y - y) 


Z(x - x) 2 


We’re part of the way through calculating the value of b, where y = a + bx. 
We’ve found that x = 4.3375, y = 38.875, and 2(x - x)(y - y) = 122.53. The 
final thing we have left to find is 2(x - x) 2 . Let’s give it a go 




. 1 . 



2(x - X ) 2 = (L2 - i.3375)^ 

u - 


^5.9 - 4.3375) 
■2.4375) 


- (2.5 - 4.3375) 2 + (3.2 - 4.3375) 2 + (3.8 - 4.3375) 2 + (4.7 - 4.3375) 2 + (5.5 - 4.3375)^ 
-(7.2 - 4.3375) 2 

-1.8375) 2 + (-1.1375) 2 + (-0.5375) 2 + (0.3625) 2 + (1.1625) 2 + (1.5625) 2 + (2.8625) 2 


wc doh ； t use y = 23.02 (to 2 decimal places) 

0>r y ih this pav-t 

the c^uatioh. 


We find the value of b by dividing 2(x - x)(y - y) by 2(x - x) 2 . This gives us 


b = 122.53/23.02 Wd b- TW.s o^Wcs slo^c 

= 5.32 - I'mc 


In other words, the line of best fit for the data is y = a + 5.32x. But what’s a? 



Dumb Quest! 


ons 


It looks like the formulas you’ve 
given are for samples rather than 
populations. Is that right? 

That’s right. We’ve used samples 
rather than populations because the data 
we’ve been given is a sample. There's 
nothing to stop you using a population if you 
have the data, just use p instead of x. 


Is the value of b always positive? 


No, it isn’t. Whether b is positive 
or negative actually depends on the type 
of linear correlation. For positive linear 
correlation, b is positive. For negative linear 
correlation, b is negative. 


I’ve heard of the term gradient. 
What’s that? 


Gradient is another term for the slope 
of the line, b. 
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What about if there’s no 
correlation? Can I still work out b? 

If there’s no correlation, you can still 
technically find a line of best fit, but it won’t 
be an effective model of the data, and you 
won’t be able to make accurate predictions 
using it. 

Is there an easy way of calculating 
b? 

Calculating b is tricky if you have lots 
of observations, but you can get software 
packages to calculate this for you. 


















y = 15.80 + 5.32x 




2 3 4 5 6 7 8 

sunshine (hours) x 
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WcVg found b, but what about a? 

So far we’ve found what the optimal value of b is for the line of best fit 
y = a + bx. What we don’t know yet is the value of a. 


correlation anc regression 


rm sure wed be able 
to find a if we knew 
one of the points it 
should go through. 


The line needs to go through point (x, y). 

It’s good for the line of best fit to go through the the point (x, y), the 
means of x and y. We can make sure this happens by substituting x and y 
into the equation for the line y = a + bx. This gives us 

y = a + bx 





or 


a = y - bx 


螓 


We’ve already found values for x, y, and b. Substituting in these values 
gives us A 厂交 

a = 38.875 -5.32(4.3375) 

= 38.875-23.0755 

=15.80 (to 2 decimal places) 

This means that the line of best fit is given by 

y = 15.80 + 5.32x 

y 60 厂 



T^elax 


If you’re taking a statistics 
exam, it’s likely you’ll be 
given this formula. 


This means that you’re unlikely to have to 
memorize it, you just need to know how 
to use it. 
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least squares regression in depth 



Least 挪 ares 取 gress 細 Up Cl^se 


The mathematical method we’ve been using to find the line of best 
fit is called least squares regression. 


Least squares regression is a mathematical way of fitting a line 
of best fit to a set of bivariate data. It’s a way of fitting a line y = a + 
bx to a set of values so that the sum of squared errors is minimized — 
in other words, so that the distance between the actual values and 
their estimates are minimized. The sum of squared errors is given by 


SSE = Z(y - y) 2 



to predict the value of y, given a value b. To do this, just substitute 
your x value into the equation y = a + bx. 

The line y = a + bx is called the regression line. 


. 

I When you’re predicting values of y 
for a particular value of x, be wary 
of predicting values that fall outside 
Wfttcll itl the area you have data points for. 

Linear regression is just an estimate based on the 
information you have, and it shows the relationship 
between the data points you know about. This doesn’t 
mean that it applies well beyond the limits of the data 
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rperi your pencil 


We’ve found an equation for the regression line, so now the 
concert organizers have a couple of questions for you. As a 
reminder, the regression line is given by 

y = 15.80 + 5.32x 

where x is the predicted hours of sunshine, and y is the concert 
attendance in 100’s. 


The predicted amount of sunshine on the day of the next concert is 6 hours. What do you expect concert 
attendance to be? 


If concert attendance looks like it’s dropping below 3,500, the concert organizers won’t make a profit and will 
have to cancel the concert. What’s the corresponding number of hours of predicted sunshine? 
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sharpen solution 



We’ve found an equation for the regression line, so now the 
concert organizers have a couple of questions for you. As a 
reminder, the regression line is given by 

y = 15.80 + 5.32x 

where x is the predicted hours of sunshine, and y is the concert 
attendance in 100’s. 


The predicted amount of sunshine on the day of the next concert is 6 hours. What do you expect concert 
attendance to be? 

As % is p\rcdid*tcd i^umbcv- o( hours 二厶 .iVe Y\ttd *to -f md 

^o\r\rcspo^d*mg pvcdi 匕 *tio 灼 -fo\r do^dc\r*t so medics y/c r\ttA *to -f'md y -fo\r *tiVis value ^ 

y 二 1^.00 + 

— 1^.90 + WL % k 

二 i^.eo + 

二 Mm 

/\s y is *m I OOs, 七 his med^s *tha*t *tiic 匕 emdev •七 a*t*tci^dair\dc is 午 77Z % 100 — 午 77Z. 


If concert attendance looks like it's dropping below 3,500, the concert organizers won’t make a profit and will 
have to cancel the concert. What’s the corresponding number of hours of predicted sunshine? 


This -time, v/C *to -f'md value of % -fo\r 3 pa\rtidula\r value o-f y. 

The 3Hcr\d3r\6c is Z^OO, y/iVidh y — 拓 . This ^ives us 

y 二 l^.QO + 

Vo — 1^.00 + 弓 .32 •乂 

3 弓 - l^.QO — 弓 32 •乂 

I1.Z 二 ^1% 

— ZM (*to Z dedimdl places) 

I 灼 o*t^C\r y/o\rdls, y/c d p\rcdi^*t do 灼 de\r 七 attc 灼 dd 灼 de *to loe below Z^OO i-f p\rcdid*tcd hours <^f is 

belov/ 1>M hours. 
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correlation and regression 


YouVg made the cowwectiow 

So far you’ve used linear regression to model the connection between 
predicted hours of sunshine and concert attendance. Once you know what 
the predicted amount of sunshine is, you can predict concert attendance 
using y = a + bx. 

Being able to predict attendance means you’ll be able to really help the 
concert organizers know what they can expect ticket sales to be, and also 
what sort of profit they can reasonably expect to make from each event. 


Thafs awesome, dude! But just one 
question. How accurate is this exactly? 


It’s the line of best fit, but we don’t know how 
accurate it is. 

The line y = a + bx is the best line we could have come up with, but 
how accurately does it model the connection between the amount 
of sunshine and the concert attendance? There’s one thing left to 
consider, the strength of correlation of the regression line. 

What would be really useful is if we could come up with some way of 
indicating how far the points are dispersed away from the line, as that 
will give an indication of how accurate we can expect our predictions 
to be based on what we already know. 

Let’s look at a few examples. 






Why do you think it’s important to know the strength of the correlation? What 
difference do you think this would make to the concert organizers? 


you are here ► 


629 







types of correlations 


Lcfs look at some correlations 


The line of best fit of a set of data is the best line we can come up with to 
model the mathematical relationship between two variables. 

Even though it’s the line that fits the data best, it’s unlikely that the line 
will fit precisely through every single point. Let’s look at some different 
sets of data to see how closely the line fits the data. 



Accurate linear correlation 

For this set of data, the linear correlation is an accurate 
fit of the data. The regression line isn’t 100% perfect, 
but it’s very close. It’s likely that any predictions made 
on the basis of it will be accurate. 


A 


would WC 
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X 

X 
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X 


X 
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X 
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X 


X 


X 


X 

X 
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No linear correlation 

For this set of data, there is no linear correlation. It’s 
possible to calculate a regression line using least squares 
regression, but any predictions made are unlikely to be 
accurate. 


Gan you see what the problem is? 

Both sets of data have a regression line, but the actual fit of the data varies 
quite a lot. For the first set of data, the correlation is very tight, but for the 
second, the points are scattered too widely for the regression line to be 
useful. 

Least squares estimates can be used to predict values, which means they 
would be helpful if there was some way of indicating how tightly the data 
points fit the line, and how accurate we can expect any predictions to be 
as a result. 

There’s a way of calculating the fit of the line, called the correlation 
coefficient. 
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correlation and regression 


The correlation coefficient measures how well the line fits the data 

The correlation coefficient is a number between - 1 and 1 that describes the 
scatter of data points away from the line of best fit. It’s a way of gauging how 
well the regression line fits the data. It’s normally represented by the letter r. 


• If r is -1, the data is a perfect negative linear correlation, with all of the 

data point in a straight line. If r is 1, the data is a perfect positive linear 
correlation. If r is 0, then there is no correlation. 
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Usually r is somewhere between these values, as -1, 0, and 1 
are all extreme. 

If r is negative, then there’s a negative linear correlation 

between the two variables. The closer r gets to -1, the stronger 
the correlation, and the closer the points are to the line. 

If r is positive, then there’s a positive linear correlation 

between the variables. The closer r gets to 1, the stronger the 
correlation. 

In general, as r gets closer to 0, the linear correlation gets 
weaker. This means that the regression line won’t be able to 
predict y values as accurately as when r is close to 1 or -1. The 
pattern might be random, or the relationship between the 
variables might not be linear. 

If we can calculate r for the concert data, we’ll have an idea 
of how accurately we can predict concert attendance based 
on the predicted hours of sunshine. So how do we calculate r? 
Turn the page and we’ll show you how. 


I*m the correlation 
coefficient, r. I 
say how strong the 
correlation is between 
the two variables. 


O 

o 



T Uk 〆 Q s^ h diha 

.饮 ^cla-tiohship. 
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calculating the correlation coefficient 


There's a formula for calculating the correlation coefficient r 


So how do we calculate the correlation coefficient, r? 


We’re not going to show you the proof for this, but the correlation 
coefficient r is given by 


r = b s 


s 





where s x is the standard deviation of the x values in the sample, and s y is 
the standard deviation of the y values. 



We’ve already done most of the hard work. 

Since we’ve already calculated b, all we have left to find is s x and s y . What’s 
more, we’re already most of the way towards finding s x . 

When we calculated b, we needed to find the value of 2(x - x) 2 . If we divide 
this by n - 1, this actually gives us the sample variance of the x values. If we 
then take the square root, we’ll have s x . In other words, 


is -the_ 

stahdamd deviatioh^^ 
"the x values i h 
"the sample, i-t s -the 

玷州 e -Poirirhula you've 

s cch bc-Povc 


s = 


X 


(X - X ) 2 







The only remaining piece of the equation we have to find is s y , the standard 
deviation of the y values in the sample. We calculate this in a similar way to 

finding s x . 


S y 




/^(y - y ) 2 


二 ;: ，匕以 ^ 


Let’s try finding what r is for the concert attendance data. 
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Fiwd r for the concert data 

Let’s use the formula to find the value of r for the concert data. First of 
all, here’s a reminder of the data: 


x (sunshine) 

1.9 

2.5 

3.2 

3.8 

4.7 

5.5 

5.9 

7.2 

y (attendance) 

22 

33 

30 

42 

38 

49 

42 

55 


To find r, we need to know the values of b, s , and s so that we can use 

) J y 

them in the formula on the opposite page. So far we’ve found that 

b = 5.32 々 — TW»s is *tV^c slope o-f "tVic found cav-l'icv-. 


but what about s and s ? 

X y 

Let’s start with s . We found earlier that 2(x - x) 2 = 23.02, and we know 
that the sample size is 8. This means that if we divide 23.02 by 7, we’ll 

have the sample variance of x. To find s , we take the square root. 

X 


X 


▽(23.02/7) 

W “如 紗 d _ oU a 一 

1.81 (to 2 decimal places) so ^ '一 I. 


The only piece of the formula we have left to find is s . We already know 


y 


that y = 38.875, as we found it earlier on, so this means that 


2(y - y ) 2 = (22 - 38.875) 2 + (33 - 38.875) 2 +(30 - 38.875) 2 +(42 - 38.875) 2 + (38 - 38.875) 2 + 

(49 - 38.875) 2 + (42 - 38.875) 2 + (55 - 38.875) 2 

= (-16.875) 2 + (-5.875) 2 + (-8.875) 2 + (3.125) 2 + (-0.875) 2 + (10.125) 2 + (3.125) 2 + (16.125)' 

= 780.875 (to 2 decimal places) 


We can now use this to find s , by dividing by n - 1 and taking the square 


root. 


y 


s = V(780.875/7 ； 


y 


V111.55357 


10.56 (to 2 decimal places) 



pmally, v/c use y values m tKc sa^lc 

bo -Prnd s 7 , sla^dard deviation y. 


All we need to do now is use b, s , and s to find the value of the 

)y 

correlation coefficient r. 
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calculating r and no dumb questions 


Fiwd r for the concert data, continued 

Now that weVe found that b = 5.32, s = 1.81, and s = 10.56, 

p x y 

we can put them together to find r. 

r 二 bs /s 

X y 

= 5.32 x 1.81/10.56 
= 0.91 (to 2 decimal places) 

As r is very close to 1, this means that there’s strong positive 
correlation between open air concert attendance and hours 
of predicted sunshine. In other words, based on the data that 
we have, we can expect the line of best fit, y = 15.80 + 5.32x, 
to give a reasonably good estimate of the expected concert 
attendance based on the predicted hours of sunshine. 


I’ve seen other ways of calculating 
r. Are they wrong? 

There are several different forms of the 
equation for finding r, but underneath, they’re 
basically the same. We’ve used the simplest 
form of the equation so that it’s easier to 
see what you’ve already calculated through 
finding b. 

Are the results accurate with such 
a small sample? 

A larger sample would definitely be 
better, but we used a small sample just to 
make the calculations easier to follow. 

You haven’t proved or derived why 
you calculate the values of b and r in this 
way. Why not? 

Deriving the formula for b and r is quite 
complex and involved, so we’ve decided not 
to go through this in the book. The key thing 
is that you understand when and how to use 
them. 


there J are no o 

Dumb Questions 

What’s the expected concert 
attendance if the predicted hours of 
sunshine is 0? 

We can’t say for certain because this 
is quite a way outside the range of data we 
have. The line of best fit is a pretty good 
estimate for the range of data that we have, 
but we can’t say with any certainty what 
the concert attendance will be like outside 
this range. The data might follow a different 
pattern outside this range, so any estimate 
we gave would be unreliable. 

When we were looking at averages, 
we saw that univariate data can have 
outliers. What about bivariate data? 

Yes, bivariate data can have outliers 
too. Outliers are points that lie a long way 
from your regression line. If you have 
outliers, then this can mean that you have 
anomalies in your data set, or alternatively, 
that your regression line isn’t a good fit of 
the data. 



■ I I I I I I I 

1 2 3 4 5 6 7 8 

sunshine (hours) x 


I’ve heard of influential 
observations. What are they? 

Influential observations are points that 
lie a long way horizontally from the rest of 
the data. Because of this, they have the 
effect of pulling the regression line towards 
them. 

Q/ So is an influential observation the 
same as an outlier? 

No. Outliers lie a long way from the 
line. Influential observations lie a long way 
horizontally from the data. 


o o o o o o o 

6 5 4 3 2 1 

y {so{H)c)ouepuc)se 
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YouVg saved the day! 

The concert organizers are amazed at the work you’ve done with 
their concert data. They now have a way of predicting what 
attendance will be like at their concerts based on the weather 
reports, which means they have a way of maximizing their profits. 




you are here ► 


635 



long exercise 



t°nt E%endSe 


The evil Swindler has been collecting data on the effect radiation exposure has on Captain 
Amazing’s super powers. Here is the number of minutes of exposure to radiation, paired with 
the number of tons Captain Amazing is able to lift: 


Radiation exposure (minutes) 

3 

3.5 

4 

4.5 

5 

5.5 

6 

6.5 

7 

Weight (tons) 

14 

14 

12 

10 

8 

9.5 

8 

9 

6 


Your job is to use least squares regression to find the line of best fit, and then find the correlation coefficient to 
describe the strength of the relationship between your line and the data. Sketch the scatter diagram too. 

If Swindler exposes Captain Amazing to radiation for 5 minutes, what weight do you expect Captain Amazing to be 
able to lift? 
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\A/cvc 7 ⑽ 

^ouv* taldulato^s 
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long exercise solution 



E%etiaSe 
Sotoiiort 


The evil Swindler has been collecting data on the effect radiation exposure has on Captain 
Amazing’s super powers. Here is the number of minutes of exposure to radiation, paired with 
the number of tons Captain Amazing is able to lift: 


Radiation exposure (minutes) 

4 

4.5 

5 

5.5 

6 

6.5 

7 

Weight (tons) 

12 

10 

8 

9.5 

8 

9 

6 


Your job is to use least squares regression to find the line of best fit, and then find the correlation coefficient to 
describe the strength of the relationship between your line and the data. Sketch the scatter diagram too. 


If Swindler exposes Captain Amazing to radiation for 5 minutes, what weight do you expect Captain Amazing to be 
able to lift? 


Let’s use *to \rep\res ⑶七 m*mu*tcs o-f \radid*tio^ exposure y *to \rcpv-csc^*t v/ei # 七 m *to^s. IVc r\ttA *to -fmd 
regression \\y\c y — a + so lets sta\rt by dakulai'm^ ^ y. 

孓二（午 + 午.弓 + 弓 + $ .弓 + 厶 + 石.弓 + 7)/7 

二 y^>n 

二 ％ 

y - 0Z + 10 + « + + 0 + 1 + W/7 

二厶 Z . 弓 /7 

二召乃 （*to 2> decimal pladcs) 

Ne 此 lets dakula*tc 2(^ - - y) b. 

2 k — 7 )(y - y )- (午一$.幻 (IZ — 0.1) + (午.弓一弓 5)(10 —《乃） + + + 

( 么一％ )(0 -《 乃 ）+ + n 么一《乃） 

- UI.^XW + (- IXI.I) + Uo.^uo.v + (OXOi) + (O^UO.V + (0(0.0 + (l.^uz.v 

- 一午.沾 - 1.1 + o. 1 ^ + 0 - oA^ + 0.1 - 千书 

二 -10 

xu - ^) 2 - - ( 午一 ％ ) z + ( 午石一弓石 ) z + + + + 0 >石一弓 s) z + n^y 

-( 一 I 石 ) z + (— l) z + (— os) z + O z + 0 分 + P + 

二 z.z^ + 1 + o.2^ + o + o.vs + 1 + m 
二 7 

b — HC'A - %)(y - y) 

HU - 7) z ~ 

二 Aon 

二 - 1 . 午 3 (*to Z dedimdl pladcs) 
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l^ov/ v/c ； vc -foui^d b, lets use i*t *to -f md d- 

a — y - 


— 0.^ + I .午 3 x 弓石 

二 0 乃 + 1 . 9 b 

二 IU 么 


This med^s -that *thc l*mc o( bcs*t -fi*t is jivci^ by y — IH 一 I .午 

The do\r\rcla*tio^ docHidie 的 *t, \r, is y\/tY\ by \r — bs^/s y y/hc\rc d^d s y a\rc *tiic s-ba^dard deviations o-f 乂 d^d y 
variables. IVc^vc -fouled b, so y/c need *to -f md a^d s y . 



-V7/i 
二 l.o« 


2(y - y) z - + Clo^.v 1 + 印一 《 .1) z + (%^V Z + 印一 《 .1) z + (°i^°i) % + 

- z.l z + l.p + ( 一 o 乃 ) z + 0i z + ( 一 0.1) z + O.P + (- l.v z 
二 p>l + I.ZI + 0.01 + O.% + O.ei + 0.01 + 0 . 午 I 


二 lill 



二 VZI.T/A 

-1.^0 


Pu*t*tmg 七 his -bo^ctiicv- jives us 
\r 二 bs/s y 

二 一 I . 午 3 X I.OG/I.^ 

二 一 0.01 (*to 2> dlcdiiw\dl ^Iddcs) 
|-f >c — ^ *thc^ v/c -f md y by ddkula*t'm^ 
y- im I 物 
- im I • 午 3 d 



\y\ o*thc\r y/o\rds, a-f*tc\r ^ m*mu*tcs o-f exposure *to rddid'tio^ y/c’d c>^pcd*t Cap*ta*m *to be able li*f 七 b>Y\s. 
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bullet points 


BULLET POINTS - 

■ Univariate data deals with just one 
variable. Bivariate data deals with two 
variables. 

■ A scatter diagram shows you patterns in 
bivariate data. 

■ Correlations are mathematical 
relationships between variables. It does 
not mean that one variable causes the 
other. A linear correlation is one that 
follows a straight line. 

■ Positive linear correlation is when low 
x values correspond to low y values, 
and high x values correspond to high 
y values. Negative linear correlation is 
when low x values correspond to high 
y values, and high x values correspond 
to low y values. If the values of x and y 
form a random pattern, then there’s no 
correlation. 

■ The line that best fits the data points is 
called the line of best fit. 

■ Linear regression is a mathematical way 
of finding the line of best fit, y = a + bx. 


■ The sum of squared errors, or SSE, is 
given by Z(y - y) 2 . 

■ The slope of the line y = a + bx is 

b = Z(x - x)(y - y) 

Z(x - x) 2 

■ The value of a is given by 

a = y - bx 

■ The correlation coefficient, r, is a number 
between -1 and 1 that describes the 
scatter of data away from the line of 
best fit. If r = -1, there is perfect negative 
linear correlation. If r = 1, there is perfect 
positive linear correlation. If r = 0, there 
is no correlation. You find r by calculating 

r = bs x 
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Leaving toww... 



Ifs been great having you here iw Statsville! 

We’re Sad to see you leave, but there’s nothing like taking what you’ve learned 
and putting it to use. There are still a few more gems for you in the back of the book, 
some handy probability tables, and an index to read though, and then it’s time to take all 
these new ideas and put them into practice. We’re dying to hear how things go, so drop 
us a line at the Head First Labs web site, www.headfirstlabs.com, and let us know how 
Statistics is paying off for YOU! 
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appendix 1 ： MtoVers 本 



Even after all that, there’s still a bit more. 

There are just a few more things we think you need to know. We wouldn’t feel right about 
ignoring them, even though they only need a brief mention, and we really wanted to 
give you a book you’d be able to lift without extensive training at the local gym. So before 
you put the book down, take a read through these tidbits. 


this is an appendix 
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dotplots and stemplots 


# 1. Other ways of presenting data 


We showed you a number of charts in the first chapter, but here are a 
couple more that might come in useful. 


Potplots 


A dotplot shows your data on a chart by representing each value as a dot. 
You put each dot in a stacked column above the corresponding value on 
the horizontal axis like this: 







I 卜祕 data 、 'ua 呼 c, Aoi^ 
you sV.a ? c vouv data. 




0 1 2 3 4 5 

No. games bought per month 


Stemplots 

A stemplot is used for quantitive data, usually when your data set is fairly 
small. Stemplots show each exact value in your data set in such a way that 
you can easily see the shape of your data. Here’s an example: 

60 0 



Wtrts a s 七 e— 。七 
bdsed ov\ *tKc 


Key: 10 I 6 = 16 


16 17 22 23 23 24 25 26 26 27 28 
29 29 30 31 31 32 32 33 34 34 35 
36 37 37 38 39 40 41 42 42 43 43 
44 45 45 49 50 50 50 51 55 58 60 

Wcvc s youir dsia. 


50 0 0 0 1 5 8 



40 

30 


0122334559 

01122344567789 


20 23 34 5667899 


10 67 



The entries on the left are called stems, and the entries on the right are 
called leaves. In this stemplot, the stem shows tens, and the leaves show 
units. To find each value in the raw data, you take each leaf and add it to its 
stem. As an example, take the line 


A s-tcmplot has a shape tha-t is similar -to a 
his-to^v-arw s, bu*t -flipped ojrfco i"ts side- 


10 I 6 7 


This represents two numbers, 16 and 17. You get 16 by adding the leaf 6 to 
its stem 10. Similarly, you get value 17 by adding the leaf 7 to the stem 10. 

There’s usually a key to help you interpret the stemplot correctly. In this case, 
the key is 10 | 6=16. 
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Wstributiow awatomy 


There are two rules that tell you where most of your data values lie in a 
probability distribution. 


The empirical rule for normal distributions 


The empirical rule applies to any set of data that follows a normal 
distribution. It states that almost all of the values lie within three standard 
deviations of the mean. In particular, 


o 

About 68% of your values lie within 1 
standard deviation of the mean. 

o 

About 95% of your values lie within 2 
standard deviations of the mean. 

o 

About 99.7% of your values lie within 3 
standard deviations of the mean. 

Just knowing the number of standard deviations from the mean can 
give you a rough idea about the probability. 


及 v : 工以喊 口’ 

Jt\, a,ea aM 山 W. 



-3a -2a -a |i a 2a 3a 


Chebyshev's rule for any distribution 

A similar rule applies to any set of data called Chebyshev y s rule, or 
Chebyshev } s inequality. It states that for any distribution 


o 

o 

o 


At least 75% of your values lie within 2 standard deviations of the mean. 

At least 89% of your values lie within 3 standard deviations of the mean. 

At least 94% of your values lie within 4 standard deviations of the mean. 


Chebychev’s rule isn’t as precise as the empirical rule, as it only gives you the 
minimum percentages, but it still gives you a rough idea of where values fall in the 
probability distribution. The advantage of Ghebyshev 5 s rule is that it applies to any 
distribution, while the empirical rule just applies to the normal distribution. 
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conducting experiments 


Experiments 

Experiments are used to test cause and effect relationships between variables. As an 
example, an experiment could test the effect of different doses of SnoreGull on snorers. 

In an experiment, indpendent variables or factors are manipulated so that we can 
see the effect on dependent variables. As an example, we might want to examine the 
effect that different doses of SnoreGull have on the number of hours spent snoring in a 
night. The doses of SnoreGull would be the independent variable, and the number of 
hours spent snoring would be the dependent variable. 

The subjects that you use for your experiment are called experimental units — in this 
case, snorers. 



So what makes for a good experiment? 

There are three basic principles you need to bear in mind when you design an 
experiment: controls, randomization, and replication. as with sampling, a 
key aim is to minimize bias. 


o 


o 

o 


You need to control the effects of external influences or natural variability. 

When you conduct an experiment, you need to minimize effects that are not part of the 
experiment. To do this, the first thing is to have a control group^ a neutral group that receives 
no treatments, or only neutral treatments. You can assess the effectiveness of the treatment by 
comparing the results of your treated groups with the results of your control group. 

A placebo is a neutral treatment, one that has no effect on the dependent variable. Sometimes 
the subjects of your experiment can respond differently to having a neutral treatment as opposed 
to having no treatment at all, so giving a placebo to a group is a way of controlling this effect. If 
the group taking a placebo doesn’t know that it’s a placebo, then this is called blinding, and it’s 
called double blinding if even those administering the treatments don’t know. 

You need to assign subjects to treatments at random. 

You’ll see more about this on the next page. 

You need to replicate treatments. 

Each treatment should be given to many subjects. You need to use many snorers per treatment to 
gauge the effects, not just one snorer. 


Another factor to be aware of is confounding. Confounding occurs when the 
controls in an experiment don’t eliminate other possible causes for the effect on the 
dependent variable. As an example, imagine if you gave doses to SnoreGull to men, 
but placebos to women. If you compared the results of the two groups, you wouldn’t 
be able to tell whether the effect on the men was because of the drug, or because one 
gender naturally snores more than another. 


646 Appendix i 









leftovers 


designing your experiment 


We said earlier that you need to randomly assign subjects to experiments. But what’s 
the best way of doing this? 


Completely randomized design 

One option is to use a completely randomized design. For 

this, you literally assign treatments to subjects at random. If 
we were to conduct an experiment testing the effect of doses 
of SnoreGull on snorers, we would randomly assign snorers to 
particular treatment groups. As an example, we could give half 
of the snorers a placebo and the other half a single dose of 
SnoreGull. 

Completely randomized design is similar to simple random 
sampling. Instead of choosing a sample at random, you assign 
treatments at random. 


Placebo 

SnoreCull 

500 

500 


r 

-thc\rc wc\rc 1,000 subjc^ wc 
匕 ould give hal-P a pUdebo a^d ihe 
a dose Shov-cCull 


Randomized block design 


Another option is to use randomized block design. For 
this, you divide the subjects into similar groups, or blocks. 

As an example, you could split the snorers into males and 
females. Within each block, you assign treatments at random, 
so for each gender, you could give half the snorers a dose of 
SnoreGull and give the other half a placebo. The aim of this 
is to minimize confounding, as it reduces the effect of gender. 

Randomized block design is similar to stratified random 
sampling. Instead of splitting your population into strata, you 
split your subjects into blocks. 



Placebo 

SnoreCull 

Male 

250 

250 

Female 

250 

250 


c T 

^00 a^d ^00 


wonuCh, wc to\AA give hal-f eddh 
a placebo a^d the o-thc^ 
a dose o( Sho\rcCull 


Matched pairs design 

Matched pairs design is a special case of randomized 
block design. You can use it when there are only two 
treatment conditions and subjects can be grouped into like 
pairs. As an example, the SnoreGull experiment could have 
two treatment conditions, to give a placebo or to give a 
single dose, and snorers could be grouped into similar pairs 
according to gender and age. You then give one of each 
pair a placebo, and the other a dose of SnoreGull. If one 
pair consisted of two men aged 30, for instance, you would 
give one of the men a placebo and the other man a dose of 
SnoreGull. 



Placebo 

SnoreCull 

Male 30 

1 

1 

Male 30 

1 

1 

Female 30 

1 

1 

Female 30 

1 

1 

■ ■ ■ 

… 

… 


/ou ^ould also -Po\rm mashed pai\rs usiha 

due h> 
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other formulas for least square regression 


# 4. Least square regression alternate wotatiow 


In Chapter 15 you saw how a least squares regression line takes the 
form y = a + bx，where 

TW«S IS ^ovmula ^ov 

2(x - X ) 2 


There’s another form of writing this that a lot of people find easier to 
remember, and that’s to rewrite it in terms of variances. If we use the 
notation 



/S x 2 = Z(X - x ) 2 ^S y 2 = z(y ■ y ) 2 s xy = Z(x - x)(y - y) 

_ 心 n ■ 1 _ 心 n ■ 1 n ■ 1 

^ X values ^ y values 

then you can rewrite the formula for the slope of a line as 

TW«s >s 

b = S m a 

- o diWcvcmt 叫， 


You can do something similar with the correlation coefficient. Instead of writing 


r = b s 

X 



s is called the covariance. Tust as the variance of x describes how the x values 

xy u 

vary, and the variance of y describes how the y values vary, the covariance of x 
and y is a measure of how x and y vary together. 
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The coefficient of determination 

The coefficient of determination is given by r 2 or R 2 . It’s the percentage 
of variation in the y variable that’s explainable by the x variable. As an 
example, you can use it to say what percentage of the variation in open-air 
concert attendance is explainable by the number of hours of predicted 
sunshine. 

y 60 厂 


sunshine (hours) x 


If r 2 = 0, then this means that you can’t predict the y value from the x value. 

If r 2 = 1, then you can predict the y value from the x value without any errors. 

Usually r 2 is between these two extremes. The closer the value of r 2 is to 1 ， 

the more predictable the value of y is from x, and the closer to r 2 it is, the less 
predictable the value of y is. 


Calculating x l 


There are two ways of calculating r-. The first way is to just square the 
correlation coefficient r. 


丁 his is just the dov-v-clatioh . 
^ocU\c\cv\i so^ared. r 2 


S 


2 


xy 




Another way of calculating it is to add together the squared distances of the 
y values to their estimates, and then divide by the result of adding together 
squared distances of the y values to y. 


r 2 = ^(y-y) 


2 


^(y - y ) 2 


6aUa 七 … 



o o o o o o 

5 4 3 2 1 

(s_ook)o OUBPUo)SB 
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non-linear relationships of two variables 


# 6. Non-linear relationships 


If two variables are related, their relationship isn’t necessarily linear. Here are 
some examples of scatter plots where there’s a clear mathematical relationship 



Linear regression assumes that the relationship between two variables can be 
described by a straight line, so performing least squares regression on raw data 
like this won’t give you a good estimate for the equation of the line. 

There is a way around this, however. You can sometimes transform x and y in 
such a way that the transformation is close to being linear. You can then perform 
linear regression on the transformation to find the values of a and b. The big trick 
is to try and transform your non-linear equation of the line so that it takes the 
form 


y’ = a + bx’ 

where y’ and x’ are functions of x. 

As an example, you might find that your line of best fit takes the form 


y = 1 /(a + bx) 


II your line oi 
test lit isn’t 


linear, you 
can sometimes 
transform it to 
a linear iorm. 


This can be rewritten as 


I 七 “ w 7’ 二 a 十 

so ^ use Imcar 

1 /y = a + bx 


so that y’ = 1/y. In other words, you can perform least squares regression using 
the line y’ = a + bx, where y’ = 1/y. Once you’ve transformed your y values, you 
can use least squares regression to find the values of a and b, then substitute these 
back into your original equation. 


^ TW»s \s a r dk 
1<^ so 7 ouky.oY/ ?oss«blc. 
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leftovers 


# 7. The confidence interval for the slope of a regression line 

You’ve seen how you can find confidence intervals for ja and a 2 . Well, 
you can also find one for the slope of the regression line y = a + bx. 

The confidence interval for b takes the form 

A 

b ± (margin of error) 

But what’s the margin of error? 

The margin of error for b 

The margin of error is given by 

margin of error = t(v) x (standard deviation of b) 


where V = n - 2, and n is the number of observations in your sample. To 
find the value of t(v), use t-distribution probability tables to look up v 


and your confidence level. 


The standard deviation of the sampling distribution of b is given by 


TWis is 

s ^ d ^d devi3ii 0h 

? the sarhp|j h a 

dis^ibu{i 0h ofh 




To calculate this, add together the differences squared between 
each actual y observation and what you estimate it to be from the 
regression line. Then divide by n - 2, and take the square root. 
Once you’ve done this, divide the whole lot by the square root of 
the total differences squared between each x observation and x. 

This gives us a confidence interval of 


(b - t(v) s b , b + t(v) s b ) 



If you’re taking 
a statistics 
exam where you 
have to use s b , 
the formula will 
be given to you. 


This means that you don’t have to 
memorize it; you just need to know how 
to apply it. 







v = n - 2 


Knowing the standard deviation of b has other uses too. As an 
example, you can also use it in hypothesis tests to test whether the 
slope of a regression line takes a particular value. 


you are here ► 
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other types of sampling distributions 


Sampling distributions - the difference between two means 


Sometimes it’s useful to know what the sampling distribution is like for the 
difference between the means of two normally distributed populations. You may 
want to use this to construct a confidence interval or conduct a hypothesis test. As 
an example, you may want to conduct a hypothesis test based on the means of two 
normally distributed populations being equal. 


9 9 • 

If X 〜 N(j^ x ， (J:) and Y 〜 N(ja ， cr ) where X and Y are independent, then the 

expectation and variance of the distribution X - Y are given by ^ N 

_ 矽八)二矽)-网) 

E(X _ ¥) u C 


Var(X - Y) = a x 2 + a 2 



y 




X 


If the population variances O f and cr — are known, then X - Y is distributed 

X '了 

normally. In other words 



X-Y 


+ f) 


You can use this to find a confidence interval for X-Y. Confidence intervals take 
the form (statistic) 土 (margin of error), so in this case, the confidence interval is 
given by 


_ _ _ ^^TWis »s 70UV 

x_y ± cVVar(X - Y)^ -^ai U >< - Y- 


The value of c depends on the level of confidence you need for your confidence 
interval: 



Level of confidence 

Value of c 

90% 

1.64 

95% 

1.96 

99% 

2.58 


vc\ 


Youv \t 


: WCS ^ 


yjoyyi 


t 


If a 2 and are unknown, then you will need to approximate them with s' 2 and 

s 2 . If the samples sizes are large, then you can still use the normal distribution. If 
lie sample sizes are small, then you will need to use the t-distribution instead. 
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leftovers 


# 9. Sampling distributions - the difference between two proportions 


There’s also a sampling distribution for the difference between the proportions 
of two binomial populations. You can use this to construct a confidence interval 
or conduct a hypothesis test. As an example, you may want to conduct a 
hypothesis test based on the proportions of two populations being equal. 

If X 〜 B(n , p ) and Y 〜 B(n , p ) where X and Y are independent, then the 


y 


expectation and variance of the distribution - are given by 



x 


P y) = Px " Py 




ww , 此 - p 7 ) 二此 ) 一啡 ) 


Var(P x - P y ) = p x q x + p y q y e 


- P 7 ) - 滅 」 + ^ (p 7 } 


X 


y 


If np and nq are both greater than 5 for each population, then - P can be 
approximated with a normal distribution. In other words 



P 


x 


Py - N^P X -P yJ ^+ 3) 


You can use this to find a confidence interval for P - P . Confidence intervals 


y 


take the form (statistic) 土 (margin of error), so in this case the confidence 
interval is given by 


p _ p ± cVVar(P - P 


y 



TWis »s yo 讲 cor\^dtY\U 


The value of c depends on the level of confidence you need for your confidence 
interval. They’re the same values of c as on the opposite page. 



If you’re taking a statistics exam where you 
have to use the sampling distribution 
between two means or two proportions, the 
variance of the sampling distribution will be 
given to you. 


This means that you don’t have to memorize them; you just 
need to know how to apply them. 


you are here ► 
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expectation and variance for continuous probability distributions 


# 10. E(X) awd Var(X) for contmuous probability distributions 

When we found the expectation and variance of discrete probability 
distributions, we used the equations 


E(X) = 2xP(X = x) 

Var(X) = 2 x 2 P(X = x) — E 2 (X) 


When your probability distribution is continuous, you find the expectation and 
variance using area. 

As an example, suppose you have a continuous probability distribution where the 
probability density function is given by 


f(x) = 0.05 

f(x) 

— ^ 

-fWis is tailed a 0.05 

as ^ ^ a 

do 灼 sta 灼七 vaUe. 


A 


o 


0 < x < 20 




20 


X 




Finding E(X) 

To find the expectation, we’d need to find the area under the curve xf(x) for the 
range of the probability distribution. Here we need to find the area under the 
line 0.05x where x is between 0 and 20 



on—al 



us 抑 ). 
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leftovers 


Finding Var(X) 

To find the variance, you need to find the area under the curve x 2 f(x) and 
subtract E 2 (X). In other words, we need to find the area under the curve 
0.05x 2 between 0 and 20 and subtract the square of E(X). 

A 

x 2 f(x) 


You don’t often 
need to find the 
expectation and 
variance of a 
continuous 
random variable. 

A lot of the time you’ll be working with 
distributions like the normal, and in this, 
case the expectation and variance are 
given to you. 



To W 


X 


W kd a 代 a 

vMrat 七 





In general, you can find the expectation and variance of a continuous 
random variable using 


over the entire range of x. 


E(X) = Jxf(x)dx 


Var(X) = Jx 2 f(x)dx - E 2 (X)^ 


㈣ 碎 

• m voWes ㈣ 士 Ws . 


CNo*tc -fvom Ca” y/e pu 七 •… 

a flu5 -fov Head Fiv-s-t Calculus— 
soorJ 



V?+aL Statistics 


Uhi-form Pis 七 ributoh 


l-f )< -follows d uhi-form distvibu*tioh 

=■ l/(b - a) wheve a < % < b 
B()<) 二 （a + b)/Z 
\/ar(><) — (b - a)VI Z 
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Looking Things Up ♦ 





Where would you be without your trusty probability tables? 

Understanding your probability distributions isn’t quite enough. For some of them, you 
need to be able to look up your probabilities in standard probability tables. In this 
appendix, you’ll find tables for the normal, t, and X 2 distributions, so you can look up 
probabilities to your heart’s content. 


this is an appendix 
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normal probabilities table 


# 1. Standard normal probabilities 

This table gives you the probability of finding P(Z < z) 
where Z 〜 N(0 ， 1). To find the P(Z < z), look up your value 
of z to 2 decimal places, then read off the probability. 


P(Z<zJ 




…七⑼ v-cad o-f-f 七 he 

fvok3bili*tY 圹 0 晰七 ^ 



z 


z 

•00 

•01 

•02 

•03 

•04 

•05 

•06 

•07 

•08 

•09 

- 3.4 

•0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 4 

.0003 

.0003 

•0002 

- 3.3 

.0005 

.0005 

.0005 

.0004 

.0004 

.0004 

.0004 

.0004 

.0004 

.0003 

- 3.2 

.0007 

.0007 

.0006 

.0006 

.0006 

.0006 

.0006 

.0005 

.0005 

.0005 

- 3.1 

.0010 

.0009 

.0009 

.0009 

.0008 

.0008 

.0008 

.0008 

.0007 

.0007 

- 3.0 

.0013 

.0013 

.0013 

.0012 

.0012 

.0011 

.0011 

•0011 

.0010 

.0010 

- 2.9 

.0019 

.0018 

.0018 

.0017 

.0016 

.0016 

.0015 

.0015 

.0014 

.0014 

- 2.8 

.0026 

.0025 

.0024 

.0023 

.0023 

.0022 

.0021 

.0021 

.0020 

.0019 

- 2.7 

.0035 

.0034 

.0033 

.0032 

.0031 

.0030 

.0029 

.0028 

.0027 

.0026 

- 2.6 

.0047 

.0045 

.0044 

.0043 

.0041 

•0040 

.0039 

.0038 

•0037 

.0036 

- 2.5 

.0062 

.0060 

.0059 

.0057 

.0055 

.0054 

.0052 

.0051 

•0049 

.0048 

- 2.4 

.0082 

.0080 

.0078 

.0075 

.0073 

.0071 

.0069 

.0068 

.0066 

.0064 

- 2.3 

.0107 

.0104 

.0102 

.0099 

.0096 

.0094 

.0091 

.0089 

.0087 

.0084 

- 2.2 

.0139 

.0136 

.0132 

.0129 

.0125 

.0122 

.0119 

•0116 

.0113 

•0110 

- 2.1 

.0179 

.0174 

.0170 

.0166 

.0162 

.0158 

•0154 

.0150 

.0146 

.0143 

- 2.0 

.0228 

.0222 

.0217 

.0212 

.0207 

.0202 

•0197 

.0192 

.0188 

.0183 

- 1.9 

.0287 

.0281 

.0274 

.0268 

.0262 

.0256 

.0250 

.0244 

.0239 

.0233 

- 1.8 

•0359 

.0351 

.0344 

.0336 

.0329 

.0322 

•0314 

.0307 

.0301 

.0294 

- 1.7 

.0446 

.0436 

.0427 

.0418 

•0409 

.0401 

.0392 

.0384 

.0375 

.0367 

- 1.6 

.0548 

.0537 

.0526 

.0516 

.0505 

.0495 

.0485 

.0475 

.0465 

.0455 

- 1.5 

.0668 

.0655 

.0643 

.0630 

.0618 

.0606 

.0594 

.0582 

.0571 

.0559 

- 1.4 

.0808 

.0793 

.0778 

.0764 

.0749 

.0735 

.0721 

.0708 

.0694 

.0681 

- 1.3 

.0968 

.0951 

.0934 

.0918 

.0901 

.0885 

.0869 

.0853 

.0838 

.0823 

- 1.2 

.1151 

.1131 

.1112 

.1093 

•1075 

.1056 

.1038 

.1020 

.1003 

.0985 

- 1.1 

.1357 

.1335 

.1314 

.1292 

.1271 

.1251 

.1230 

.1210 

.1190 

.1170 

- 1.0 

.1587 

.1562 

.1539 

.1515 

•1492 

.1469 

.1446 

.1423 

•1401 

.1379 

- 0.9 

.1841 

.1814 

.1788 

.1762 

.1736 

.1711 

.1685 

.1660 

.1635 

.1611 

- 0.8 

.2119 

.2090 

.2061 

.2033 

.2005 

•1977 

.1949 

.1922 

•1894 

.1867 

- 0.7 

.2420 

.2389 

.2358 

.2327 

.2296 

.2266 

.2236 

.2206 

.2177 

.2148 

- 0.6 

.2743 

.2709 

.2676 

.2643 

.2611 

.2578 

.2546 

.2514 

.2483 

.2451 

- 0.5 

.3085 

.3050 

.3015 

.2981 

.2946 

.2912 

.2877 

.2843 

.2810 

.2776 

- 0.4 

.3446 

.3409 

.3372 

.3336 

.3300 

.3264 

.3228 

.3192 

.3156 

.3121 

- 0.3 

.3821 

.3783 

.3745 

.3707 

.3669 

.3632 

.3594 

.3557 

.3520 

.3483 

- 0.2 

.4207 

.4168 

.4129 

.4090 

•4052 

.4013 

.3974 

.3936 

.3897 

.3859 

- 0.1 

.4602 

.4562 

.4522 

.4483 

.4443 

.4404 

.4364 

•4325 

.4286 

.4247 

- 0.0 

.5000 

.4960 

.4920 

•4880 

.4840 

.4801 

.4761 

.4721 

.4681 

.4641 


TV^csc av-c 
^ -for P(Z-' 



•tKc \> v - okak ' il '»* t*>cs 

: z) y/)icv-c 2- 
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statistics tables 


# 1. Standard normal probabilities (cowt.) 



TVicsc arc -tV^c \>rokak'il'i*t'ic 
(or ?(Z < ^ 2. is 

positive. 



z 


z 

•00 

•01 

•02 

•03 

•04 

•05 

•06 

•07 

•08 

•09 

0.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

•5279 

.5319 

.5359 

0.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

0.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

0.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

0.4 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

0.5 

.6915 

.6950 

.6985 

•7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

0.6 

.7257 

•7291 

.7324 

.7357 

•7389 

.7422 

.7454 

.7486 

.7517 

.7549 

0.7 

.7580 

.7611 

.7642 

.7673 

.7704 

•7734 

.7764 

.7794 

.7823 

.7852 

0.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

0.9 

.8159 

.8186 

.8212 

.8238 

.8264 

•8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

•8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

•8907 

• 8925 

.8944 

.8962 

•8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

•9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

■9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

•9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

•9732 

•9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

•9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

•9834 

.9838 

.9842 

.9846 

•9850 

.9854 

•9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

•9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

•9911 

.9913 

.9916 

2.4 

•9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

•9938 

•9940 

.9941 

•9943 

.9945 

.9946 

•9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

•9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

•9984 

•9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

•9987 

.9988 

.9988 

.9989 

■9989 

■9989 

.9990 

■9990 

3.1 

•9990 

.9991 

•9991 

•9991 

•9992 

.9992 

.9992 

.9992 

.9993 

■9993 

3.2 

•9993 

.9993 

■9994 

.9994 

•9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

•9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

•9997 

3.4 

■9997 

.9997 

.9997 

.9997 

.9997 

.9997 

■9997 

.9997 

.9997 

■9998 
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t-distribution table 


1 - distribution critical values 


This table gives you the values of t where P(T > t) = p. T follows a t-distribution 
with v degrees of freedom. Look up the values of v and p and look up t. 

...look uf f m 七 he *f ivs 七 voy/ … 


\ P(T > V 


Look uf V m 七乙 olumyv.. 




probability p 


V 

•25 

•20 

.15 

.10 

•05 

.025 

•02 

•01 

.005 

•0025 

.001 

•0005 

1 

1.000 

1.376 

1.963 

3.078 

6.314 

12.71 

15.89 

31.82 

63.66 

127.3 

318.3 

636.6 

2 

.816 

1.061 

1.386 

1.886 

2.920 

4.303 

4.849 

6.965 

9.925 

14.09 

22.33 

31.60 

3 

.765 

.978 

1.250 

1.638 

2.353 

3.182 

3.482 

4.541 

5.841 

7.453 

10.21 

12.92 

4 

.741 

.941 

1.190 

1.533 

2.132 

2.776 

2.999 

3.747 

4.604 

5.598 

7.173 

8.610 

5 

.727 

.920 

1.156 

1.476 

2.015 

2.571 

2.757 

3.365 

4.032 

4.773 

5.893 

6.869 

6 

.718 

.906 

1.134 

1.440 

1.943 

2.447 

2.612 

3.143 

3.707 

4.317 

5.208 

5.959 

7 

.711 

.896 

1.119 

1.415 

1.895 

2.365 

2.517 

2.998 

3.499 

4.029 

4.785 

5.408 

8 

.706 

.889 

1.108 

1.397 

1.860 

2.306 

2.449 

2.896 

3.355 

3.833 

4.501 

5.041 

9 

.703 

.883 

1.100 

1.383 

1.833 

2.262 

2.398 

2.821 

3.250 

3.690 

4.297 

4.781 

10 

.700 

.879 

1.093 

1.372 

1.812 

2.228 

2.359 

2.764 

3.169 

3.581 

4.144 

4.587 

11 

.697 

•876 

1.088 

1.363 

1.796 

2.201 

2.328 

2.718 

3.106 

3.497 

4.025 

4.437 

12 

.695 

.873 

1.083 

1.356 

1.782 

2.179 

2.303 

2.681 

3.055 

3.428 

3.930 

4.318 

13 

.694 

.870 

1.079 

1.350 

1.771 

2.160 

2.282 

2.650 

3.012 

3.372 

3.852 

4.221 

14 

.692 

.868 

1.076 

1.345 

1.761 

2.145 

2.264 

2.624 

2.977 

3.326 

3.787 

4.140 

15 

.691 

.866 

1.074 

1.341 

1.753 

2.131 

2.249 

2.602 

2.947 

3.286 

3.733 

4.073 

16 

.690 

.865 

1.071 

1.337 

1.746 

2.120 

2.235 

2.583 

2.921 

3.252 

3.686 

4.015 

17 

.689 

.863 

1.069 

1.333 

1.740 

2.110 

2.224 

2.567 

2.898 

3.222 

3.646 

3.965 

18 

.688 

.862 

1.067 

1.330 

1.734 

2.101 

2.214 

2.552 

2.878 

3.197 

3.611 

3.922 

19 

.688 

• 861 

1.066 

1.328 

1.729 

2.093 

2.205 

2.539 

2.861 

3.174 

3.579 

3.883 

20 

.687 

.860 

1.064 

1.325 

1.725 

2.086 

2.197 

2.528 

2.845 

3.153 

3.552 

3.850 

21 

.686 

.859 

1.063 

1.323 

1.721 

2.080 

2.189 

2.518 

2.831 

3.135 

3.527 

3.819 

22 

.686 

.858 

1.061 

1.321 

1.717 

2.074 

2.183 

2.508 

2.819 

3.119 

3.505 

3.792 

23 

.685 

.858 

1.060 

1.319 

1.714 

2.069 

2.177 

2.500 

2.807 

3.104 

3.485 

3.768 

24 

.685 

.857 

1.059 

1.318 

1.711 

2.064 

2.172 

2.492 

2.797 

3.091 

3.467 

3.745 

25 

.684 

.856 

1.058 

1.316 

1.708 

2.060 

2.167 

2.485 

2.787 

3.078 

3.450 

3.725 

26 

.684 

.856 

1.058 

1.315 

1.706 

2.056 

2.162 

2.479 

2.779 

3.067 

3.435 

3.707 

27 

.684 

.855 

1.057 

1.314 

1.703 

2.052 

2.158 

2.473 

2.771 

3.057 

3.421 

3.690 

28 

.683 

.855 

1.056 

1.313 

1.701 

2.048 

2.154 

2.467 

2.763 

3.047 

3.408 

3.674 

29 

.683 

.854 

1.055 

1.311 

1.699 

2.045 

2.150 

2.462 

2.756 

3.038 

3.396 

3.659 

30 

.683 

.854 

1.055 

1.310 

1.697 

2.042 

2.147 

2.457 

2.750 

3.030 

3.385 

3.646 

40 

.681 

.851 

1.050 

1.303 

1.684 

2.021 

2.123 

2.423 

2.704 

2.971 

3.307 

3.551 

50 

.679 

.849 

1.047 

1.299 

1.676 

2.009 

2.109 

2.403 

2.678 

2.937 

3.261 

3.496 

60 

.679 

.848 

1.045 

1.296 

1.671 

2.000 

2.099 

2.390 

2.660 

2.915 

3.232 

3.460 

80 

.678 

.846 

1.043 

1.292 

1.664 

1.990 

2.088 

2.374 

2.639 

2.887 

3.195 

3.416 

100 

.677 

.845 

1.042 

1.290 

1.660 

1.984 

2.081 

2.364 

2.626 

2.871 

3.174 

3.390 

1000 

.675 

.842 

1.037 

1.282 

1.646 

1.962 

2.056 

2.330 

2.581 

2.813 

3.098 

3.300 

oo 

.674 

.841 

1.036 

1.282 

1.645 

1.960 

2.054 

2.326 

2.576 

2.807 

3.091 

3.291 


50 % 

60 % 

70 % 

80 % 

90 % 

95 % 

96 % 

98 % 

99 % 

99 . 5 % 

99 . 8 % 

99 . 9 % 


Confidence level 
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statistics tables 


X z critical values 


This table gives you the value of x where P(X > x) = a. X 
has a x 2 distribution with v degrees of freedom. Look up 
the values of v and a, and read off x. 




Look value V m 


...look value a m 

-f iv-s-b v-oy/ v . 



probability a 




V 

•25 

•20 

.15 

.10 

•05 

.025 

. 02 ^ 

•01 

.005 

.0025 

.001 

1 

1.32 

1.64 

2.07 

2.71 

3.84 

5.02 

5.41 

6.63 

7.88 

9.14 

10.83 

2 

2.77 

3.22 

3.79 

4.61 

5.99 

7.38 

7.82 

9.21 

10.60 

11.98 

13.82 

3 

4.11 

4.64 

5.32 

6.25 

7.81 

9.35 

9.84 

11.34 

12.84 

14.32 

16.27 

4 

5.39 

5.99 

6.74 

7.78 

9.49 

11.14 

11.67 

13.28 

14.86 

16.42 

18.47 

5 

6.63 

7.29 

8.12 

9.24 

11.07 

12.83 

13.39 

15.09 

16.75 

18.39 

20.51 

6 

7.84 

8.56 

9.45 

10.64 

12.59 

14.45 

15.03 

16.81 

18.55 

20.25 

22.46 

7 

9.04 

9.80 

10.75 

12.02 

14.07 

16.01 

16.62 

18.48 

20.28 

22.04 

24.32 

8 

10.22 

11.03 

12.03 

13.36 

15.51 

17.53 

18.17 

20.09 

21.95 

23.77 

26.12 

9 

11.39 

12.24 

13.29 

14.68 

16.92 

19.02 

19.68 

21.67 

23.59 

25.46 

27.88 

10 

12.55 

13.44 

14.53 

15.99 

18.31 

20.48 

21.16 

23.21 

25.19 

27.11 

29.59 

11 

13.70 

14.63 

15.77 

17.28 

19.68 

21.92 

22.62 

24.72 

26.76 

28.73 

31.26 

12 

14.85 

15.81 

16.99 

18.55 

21.03 

23.34 

24.05 

26.22 

28.30 

30.32 

32.91 

13 

15.98 

16.98 

18.20 

19.81 

22.36 

24.74 

25.47 

27.69 

29.82 

31.88 

34.53 

14 

17.12 

18.15 

19.41 

21.06 

23.68 

26.12 

26.87 

29.14 

31.32 

33.43 

36.12 

15 

18.25 

19.31 

20.60 

22.31 

25.00 

27.49 

28.26 

30.58 

32.80 

34.95 

37.70 

16 

19.37 

20.47 

21.79 

23.54 

26.30 

28.85 

29.63 

32.00 

34.27 

36.46 

39.25 

17 

20.49 

21.61 

22.98 

24.77 

27.59 

30.19 

31.00 

33.41 

35.72 

37.95 

40.79 

18 

21.60 

22.76 

24.16 

25.99 

28.87 

31.53 

32.35 

34.81 

37.16 

39.42 

42.31 

19 

22.72 

23.90 

25.33 

27.20 

30.14 

32.85 

33.69 

36.19 

38.58 

40.88 

43.82 

20 

23.83 

25.04 

26.50 

28.41 

31.41 

34.17 

35.02 

37.57 

40.00 

42.34 

45.31 

21 

24.93 

26.17 

27.66 

29.62 

32.67 

35.48 

36.34 

38.93 

41.40 

43.78 

46.80 

22 

26.04 

27.30 

28.82 

30.81 

33.92 

36.78 

37.66 

40.29 

42.80 

45.20 

48.27 

23 

27.14 

28.43 

29.98 

32.01 

35.17 

38.08 

38.97 

41.64 

44.18 

46.62 

49.73 

24 

28.24 

29.55 

31.13 

33.20 

36.42 

39.36 

40.27 

42.98 

45.56 

48.03 

51.18 

25 

29.34 

30.68 

32.28 

34.38 

37.65 

40.65 

41.57 

44.31 

46.93 

49.44 

52.62 

26 

30.43 

31.79 

33.43 

35.56 

38.89 

41.92 

42.86 

45.64 

48.29 

50.83 

54.05 

27 

31.53 

32.91 

34.57 

36.74 

40.11 

43.19 

44.14 

46.96 

49.64 

52.22 

55.48 

28 

32.62 

34.03 

35.71 

37.92 

41.34 

44.46 

45.42 

48.28 

50.99 

53.59 

56.89 

29 

33.71 

35.14 

36.85 

39.09 

42.56 

45.72 

46.69 

49.59 

52.34 

54.97 

58.30 

30 

34.80 

36.25 

37.99 

40.26 

43.77 

46.98 

47.96 

50.89 

53.67 

56.33 

59.70 

40 

45.62 

47.27 

49.24 

51.81 

55.76 

59.34 

60.44 

63.69 

66.77 

69.70 

73.40 

50 

56.33 

58.16 

60.35 

63.17 

67.50 

71.42 

72.61 

76.15 

79.49 

82.66 

86.66 

60 

66.98 

68.97 

71.34 

74.40 

79.08 

83.30 

84.58 

88.38 

91.95 

95.34 

99.61 

80 

88.13 

90.41 

93.11 

96.58 

101.9 

106.6 

108.1 

112.3 

116.3 

120.1 

124.8 

100 

109.1 

111.7 

114.7 

118.5 

124.3 

129.6 

131.1 

135.8 

140.2 

144.3 

149.4 


v-cad 

^ 一 " i\\t value oi % 
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Symbols 

symbol (see conditional probabilities) 

D intersection 
finding 159 

P(A Pi B) versus P(A | B) 165 
P(Black D Even) 167 
P(Even) 167 
1/p, expectation 281 

when large 407 
when small 407 

\ distribution (see Poisson distribution) 

[i (mu) 50, 445 

confidence intervals 498 
v(nu) 573 

degrees of freedom 574 
2 (sigma) 49 
mean 49 
a (sigma) 107 
X2 (chi square) 576 

X2 (chi square) distribution 567-604 
cheat sheet 584 
contingency table 587 
defined 572 

degrees of freedom 574, 576, 595 
calculating 591 
generalizing 596—597 
expected frequencies 587-588 
goodness of fit 573, 579, 584 
independence 573, 586 
main uses 573 
significance 575 
v(nu) 573 

X2 (chi square) hypothesis testing steps 576 


X2 (chi square) probability tables 575 

X2 (chi square) test 571 

x (x bar) 445-447, 472-476 
distribution of 476—486 

A 

accurate linear correlation 630 

alternate hypothesis 529-530, 543 

average 46—82 

mean (see mean) 
median (see median) 
mode (see mode) 
types of 71 

average distance 105 

interquartile range 105 

B 

bar charts 10—20,23 
frequency scales 13 
percentage scales 12 
scales 23 

segmented bar chart 14 
split-category bar chart 14 

Bayes’Theorem 173, 178-179 

bias 423-426, 434, 438 

in sampling 424—426, 438 
sources 425 

bimodal 73 

binomial distribution 289, 324, 384, 392—393, 544 
approximating 389, 398, 407 
approximating with normal distribution 386 
approximating with Poisson distribution 316-317 
central limit theorem 482 


this is the index 663 
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binomial distribution (continued) 
discrete 395 

expectation and variance 298, 301 
finding mean and variance 389 
guide 302 

versus normal distribution 393, 395 
Binomial Distribution Up Close 297 
binomial probabilities 384 

bivariate data 608, 616, 640 
visualizing 609 
blinding 646 

box and whisker diagrams 100-102 
box plot 100 

Bullet Points 
bias 438 

binomial distribution 324 

bivariate data 640 

box and whisker diagram 102 

cluster sampling 438 

continuity correction 396 

continuous data 337 

continuous probability distributions 337 

correlation coefficient 640 

critical region 539 

cumulative frequency 42 

discrete data 337 

expectation and variance of X 485 
expectation of random variable X 224 
expectations 220, 233 
frequency density 30 
geometric distribution 324 
histograms 30 
hypothesis tests 539 
Type I error 566 
Type II error 566 
independent observations 378 
independent observations of X 233 
independent random variables 233 
interpercentile range 102 
interquartile range 97 


kth percentile 102 
linear regression 640 
linear transforms 220, 224, 233 
line of best fit 640 
negative linear correlation 640 
normal distribution 359 
approximating 396 
normal probabilities 359 
one-tailed tests 539 
p-value 539 
percentiles 102 
point estimator 447 
Poisson distribution 324, 412 
population 438 
positive linear correlation 640 
probability distributions 220, 224 
quartiles 97 
range 97 
sample 438 

sampling distribution of means 485 
sampling distribution of proportions 466 
scatter diagrams 640 
significance level 539 
simple random sampling 438 
standard deviation 122,220 
a 224 

standard error of proportion 466 

standard error of the mean 485 

standard scores 122 

stratified sampling 438 

sum of squared errors 640 

systematic sampling 438 

test statistic 539 

two-tailed tests 539 

univariate data 640 

upper and lower bounds 97 

variance of random variable X 224 

variances 122, 220, 233 

z-scores 122 

X2 distribution 598 

goodness of fit test 598 
test for independence 598 
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c 

categorical data 18, 73 
mean 62 
median 62 

categories versus numbers 18-23 
causation versus correlation 614 
census 418 

central limit theorem 481—482, 485 
binomial distribution 482 
Poisson distribution 482 

central tendency 45-82 

charts and graphs 4 
bar charts 10-20, 23 
bar chart scales 23 
choosing right one 39—40 
comparing 6 

cumulative frequency 35, 42 
failure 9 

frequency 8—9, 23 
frequency scales 13 
histograms (see histograms) 
horizontal bar charts 1 1， 23 
line charts 41, 42 
multiple sets of data 14, 23 
numerical data 23 
percentage sales 12 
pie charts 8-9, 9, 23 
proportions 9 
scales 12 

segmented bar chart 14 
software 6 

split-category bar chart 14 
vertical bar charts 10—11,23 

Chebyshev’s inequality 645 

chi square ()(2) 576 

chi square ()(2) distribution 567-604 
cheat sheet 584 
contingency table 587 


defined 572 

degrees of freedom 574, 576, 595 
calculating 591 
generalizing 596—597 
expected frequencies 587-588 
goodness of fit 573, 579, 584 
independence 573, 586 
main uses 573 
significance 575 
v(nu) 573 

chi square ()(2) hypothesis testing steps 576 

chi square ()(2) probability tables 575 

chi square ()(2) test 571 

clustered sampling 434 

cluster sampling 433—434, 436, 438 

coefficient of determination 649 

combinations (see permutations and combinations) 

combined weight 
continuous 365 
distributed 367 
distributed normally 365 

complementary event 136 

completely randomized design (experiments) 647 

conditional probabilities 157-160 
Bayes’ Theorem 173 
P(A Pi B) versus P(A | B) 165 
P(Black I Even) 170 
probability tree 158-161 

confidence intervals 487-520, 539 
cheat sheet 504 
confidence level changes 518 
four steps for finding 491-502 

Step 1: Choose your population statistic 492, 508 
Step 2: Find its sampling distribution 492, 509 
Step 3: Decide on the level of confidence 494, 512 
Step 4: Find the confidence limits 496—501, 513 
introducing 490 
point estimators 493 

selecting appropriate confidence level 495 
size of sample changes 518 
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confidence intervals (continued) 
slope of regression line 651 
summary 503 
t-distributions 509-515 
probability tables 513 
shortcuts 515 
small sample 510 
standard score 511 
versus confidence level 507 

confidence level versus confidence interval 507 
confidence limits 496, 502, 513 
confounding 646 
contingency table 587 
continuity correction 395-398, 412 
Continuity Corrections Up Close 397 

continuous data 327, 337, 365 
frequency 328 

probability distribution 329—333 
range of values 333 
versus discrete data 366 

continuous probabilities 333 

continuous probability distributions 337 
E(X) and Var(X) 654-655 
continuous random variables 331 

continuous scale versus discrete probability 
distribution 395 
control group 646 
controls 646 

correlation and regression 605—642 
accurate linear correlation 630 
bivariate data 608, 616, 640 
visualizing 609 

correlation coefficient 630—634, 640 
correlation versus causation 614 
dependent variable 608 
explanatory variable 608 
independent variable 608 
least squares regression 626 
linear regression 626, 640 


line of best fit 618, 624, 640 
finding equation 622 
finding slope 623—624 
sum of squared errors 620—621 
negative linear correlation 613, 631， 640 
no correlation 613, 631 
no linear correlation 630 
outliers 634 

perfect negative linear correlation 631 

perfect positive linear correlation 631 

positive linear correlation 613, 631， 640 

regression line 626 

response variable 608 

scatter diagrams 609, 612, 616, 618, 640 

scatter plots 609 

sum of squared errors 640 

univariate data 608, 640 

correlation coefficient 631—634, 640 
formula 632 

least square regression 648 

critical region 531-534, 539, 548 

Critical Regions Up Close 534 

critical value 532 

cumulative frequency 34—38, 42 
graph 35 

D 

data 

categorical and numerical data 18 

categorical data 18 

grouped 19 

multiple sets of data 14 

numerical data 18 

qualitative data 18 

deciles 98 

degrees of freedom 574, 576, 595 
calculating 591 
generalizing 596—597 
number of 510 

dependent events 181, 189-190 
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dependent variables (experiments) 608, 646 

discrete data 329, 337, 370 

versus continuous data 326—327, 366 
discrete probability distributions 197-240 
expectation 204-208 
linear transforms 233 
expectations 219 

independent observations 224, 225-226 
linear relationship between E(X) and E(Y) 217-218 
linear transforms 219, 225-226 
expectation and variance 233 
linear transforms versus playing multiple games 221 
observation 222-224 
observation shortcuts 223 
Pool Puzzle 215—216 
random variables 
adding 230 
independent 233 
subtracting 231 
shortcut or formula 236 
variance 205-208, 219 
linear transforms 233 
versus continuous scale 395 

discrete random variables 202 

distribution 

anatomy 645 
mean 56 
ofX + Y 370 

dotplots 644 
double blinding 646 
drawing lots 431,434 

E 

E(X) and Var(X) for continuous probability distributions 
654-655 

empirical rule for normal distribution 645 

estimating populations and samples 441-486 
central limit theorem 481—482, 485 
binomial distribution 482 
Poisson distribution 482 


distribution of P 464—466 

S 

expectation of P s 462 
formulas 451 

point estimators 443-447, 452 
for population variance 457 
sampling distributions 485 
population mean 443, 446 
population parameters 444 
population proportion 454-457 
population variance 448-450 
probabilities for a sample 459 
proportions, sampling distribution of 460 
sample mean 445, 446 
sample variance 449, 452 
sampling distribution 466 
continuity correction 469 
of proportions 460 

sampling distribution of means 471-479 
distribution of x 480 
expectation for X 474—475 
variance of X 476 
standard error 485 
of mean 479 
of proportion 466 
variance of Ps 463 
x bar 445 
jj, 445 

events 132 

complementary 136 
dependent 181 
exclusive 147-154 

versus exhaustive 150 
independent 182-184 

versus dependent 189—190 
intersecting 147—154 
mutually exclusive 147, 150 

exclusive events 147-154, 150 
exhaustive 149 
exhaustive events 150 
expectations 204-208, 219, 220, 367 

1/p 281 

binomial distribution 298 
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expectations (continued) 

geometric distribution 280-281 
independent observations 378 
linear transforms 233 
Poisson distribution 308 
two games 222-224 

experimental units 646 

experiments 646 
designing 647 
explanatory variable 608 

F 

factorials 246, 248 

Fireside Ghats, Dependent and Independent discuss their 
differences 186-187 

Five Minute Mystery 

Case of the Broken Cookies 315 
Solved 318 

Case of the High Sunscreen Sales 611 
Solved 615 

Case of the Lost Coffee Sales 421 
Solved 429 

Case of the Missing Parameters 357 
Solved 358 

Case of the Moving Expectation 211 
Solved 220 

The Case of the Ambiguous Average 51 
Solved 81 

The Case of the Two Glasses 185 
Solved 188 

formulas for arrangements 248 

frequencies 8, 23, 67-68, 73 
comparing 14 
continuous data 328 
cumulative frequency 34—38, 42 
highest frequency group of values 52 
histograms 24-30 
percentages with no frequencies 12 

frequency density 27—32, 68 
Frequency Density Up Close 29 
frequency scales 13 
668 Index 


G 

Gaussian distribution 352 

geometric distribution 277—287, 297, 301， 324 
guide 284 
inequalities 279 

pattern of expectations 280-281 
variance 281-284 

Geometric Distribution Up Close 278 

goodness of fit 573 
test 579 

graphs (see charts and graphs) 
grouped data 19 

H 

height probabilities 338-341 

histograms 19—28 

frequency 24-30, 25 
intervals 20 
making 20 

making area proportional to frequency 26—28 
mean 56 

unequal intervals 24—30 
when not to use 33 

horizontal bar charts 11, 23 
horse racing 243—246 

hypothesis tests 521-566 

alternate hypothesis 529-530, 543 
critical region 531-534, 539, 548 
critical value 532 
null hypothesis 528, 543 
one-tailed tests 534, 539 
p-value 539 

power of a hypothesis test 561 
process 526-539 
overview 527 

Step 1: Decide on the hypothesis 528-529, 543 
Step 2: Choose the test statistic 531, 544 
Step 3: Determine the critical region 532, 548 
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Step 4: Find the p-value 535—536 

Step 5: Is the sample result in the critical region? 

537 

Step 6: Make your decision 537 
significance level 533, 538, 539 
statistically significant 551 
test statistic 531, 539, 544, 547 
two-tailed tests 534, 539 
Type I error 555-560, 566 
Type II error 555—560, 566 



incorrect sampling unit 425 
independence 573 

independent events 182-183, 189-190 
versus mutually exclusive 183 
independent observations 224-226, 377, 472 
expectation 378 
of X 233 
variance 378 

versus linear transforms 376-378 
independent random variables 230—233, 368 
indpendent variables 608, 646 

information 

versus data 5 

visualizing (see visualizing information) 
interpercentile range 98, 102 

interquartile range 92—93, 97 
average distance 105 
versus the median 97 

intersecting events 147—154 
intersection 149—154 

K 

kth percentile 99, 102 



Law of Total Probability 172, 178 
least squares regression 626, 648 
Least Squares Regression Up Close 626 
leaves 644 

left-skewed data 62, 64 

letters, using to represent numbers 48-49 

linear correlations 613, 630-631 

Linear Correlations Up Close 613 

linear regression 626, 640, 650 

linear relationship between E(X) and E(Y) 217-218 

linear transforms 219, 220, 224-226 
distribution 376 
expectation and variance 233 
versus independent observations 376-378 
versus playing multiple games 221 

line charts 41, 42 
Line Charts Up Close 41 

line of best fit 618, 622, 640 
finding equation 622 
finding slope 623-624 
minimizing errors 620—621 
non-linear 650 

sum of squared errors 620—621 
lower bounds 86, 97 
basketball scores 88 
lower quartile 92 
finding 94 

M 

matched pairs design (experiments) 647 

mean 47—60 

basketball scores 88 
binomial distribution 389 
calculating 50 
calculating when to use 78 
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mean (continued) 

categorical data 62 
distributions 56 
frequencies 52 
frequency density 68 
histograms 56 
of two middle numbers 61 
outliers 57-59 

positive and negative distances 105 

problems with 65-72 

skewed data 62, 64 

standard deviations from 121 

using letters to represent numbers 48-49 

versus median 62 

X + Y 368 

[i (mu) 50 

2 (sigma) 49 

measuring probability 132 

median 61-70 

calculating when to use 78 
categorical data 62 
frequency density 68 
in three steps 62 
middle quartile 92 
problems with 65-72 
skewed data 64 
versus mean 62 

versus the interquartile range 97 
middle quartile 92 
modal class 73 
mode 73-80 

calculating when to use 78 
categorical data 73 
three steps for finding 74 

mu (see (mu)) 

multiple sets of data 14, 23 

mutually exclusive events 147, 150 


K 

n! 248 

negative linear correlation 613, 631, 640 

no correlation 613, 631 

No Dumb Questions 

adding probabilities 143 
alternate hypothesis 530 
approximating binomial distribution 398 
arranging objects in circle 248 
average distance 

interquartile range 105 
Bayes’Theorem 179 
bias 426, 434 

binomial distribution 301, 412 

bivariate data 616 

box and whisker diagram 101 

breaking data into more than four pieces 97 

central limit theorem 485 

charts 5 

clustered sampling 434 

confidence intervals 491, 518, 539 

confidence interval versus confidence level 507 

continuity corrections 398, 412 

continuous data 370 

continuous distributions 352 

correlation coefficient 634 

cumulative frequency 36 

degrees of freedom 576, 595 

discrete data 370 

discrete random variable 203 

distribution of X + Y 370 

drawing lots 434 

E(X 1 + X 2 ) and E(2X) 224 

expectation 208, 219 

factorials 248 

frequency density 30 

Gaussian distribution 352 

geometric distribution 277, 284, 301 

histograms 23, 30 

how data is spread out 97 
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hypothesis tests 530, 552 

independent events 184 

independent observations 378 

independent versus mutually exclusive 184 

information versus data 5 

interquartile range 97 

limit on intersecting events 154 

linear transforms 219, 378 

line charts 42 

line of best fit 624 

mean or median with categorical data 62 
mean with skewed data 62 
median 352 

versus mean 62 


probability trees 165,179 

proportion versus probability 456 

questionnaires 426 

random variables 233 

right- and left-skewed data 62 

roulette wheel 184 

sample mean 446 

sample variance 452 

sampling bias 434 

sampling distribution 466 

sampling frame 426 

scatter diagrams 616 

set theory 139 

shortcuts 370 


versus the interquartile range 97 
n! 248 

normal distribution 


significance level 539 
significance tests 552 
slot machines 208 


accuracy of 398 

approximating binomial or Poisson distribution 

412 

normal probability tables 352 
null hypothesis 530 
outliers 634 
P(Black I Even) 179 


standard deviation 113, 122, 208 
standard error 485 
of proportion 466 
standard scores 122, 347, 352 
outliers 122 
statistical sampling 
bias 426 


permutations and combinations 263 
arranging by type 257 
point estimators 446, 452 

and sampling distributions 485 
Poisson distributions 311, 314, 412 

approximating binomial distribution 317, 398 
population mean 446 


clustered sampling 434 
drawing lots 434 
increasing sample size 434 
simple random sampling 434 
stratified sampling 434 
stratified sampling 434 


systematic sampling 434 


positive and negative distances 105 
probabilities written as fractions, decimals, or 
percentages 139 
probability 139 
best method 143 
probability density function 334 
probability distributions 203 


t-distributions 518 
target population 426 
Type I error 560 
Type II error 560 
variance 122, 208 
variance equations 113 
variances 219 


letters p and q 284 
quiz show 290 

probability for standardized range 347 
probability of range 352 
probability tables 352, 370 


Venn diagrams 139 ， 165, 184 
)(2 (chi square) distribution 595 
X2 (chi square) tests 576 

no linear correlation 630 
non-linear relationships 650 
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normal approximation 394 

normal distribution 325-360, 361-414 
accuracy of 398 

approximating continuity correction 396 
approximating binomial distribution 386 
approximating binomial or Poisson distribution 412 
approximating binomial probabilities 397 
binomial distribution 384, 389, 392—393 
approximating 398, 407 
continuous 395 
continuous data 337, 365 
continuous distributions 352 
continuous probability distributions 337 
defined 339-340 


standardizing normal variables 344 
tables 349-352, 352, 658-659 

nu (see v (nu)) 

null hypothesis 528, 530, 543 

numbers, using letters to represent 48-49 

numerical data 18, 23 

0 

observations 222-224 
independent 224 
shortcuts 223 

one-tailed tests 534, 539 


discrete data 337 


outliers 57—59, 89—91, 93, 634 


discrete data versus continuous data 326—327 

empirical rule 645 

finding < probabilities 397 

finding > probabilities 397 

finding between probabilities 397 

frequency and continuous data 328 

Gaussian distribution 352 

height probabilities 338-341 

in place of binomial distribution 389 

median 352 

normal probability tables 352 

Poisson distribution 386, 406 

Pool Puzzle 399—400 

probability = area 331 

probability density function 330—337, 337 

probability for standardized range 347 

probability of range 352 

probability tables 349-352 

standard score 345-347, 352 

table 411 

transforming 345 

versus binomial distribution 393, 395 
versus t-distributions 515 

Normal Distribution Exposed 404 

normal probabilities 359 
calculating 341-352 

determining distribution 343 


interquartile range 93 
standard scores 122 

P 

p-value 535—536, 539 
percentage sales 12 
percentages with no frequencies 12 

percentiles 98-99, 102 
kth percentile 99, 102 
perfect negative linear correlation 631 
perfect positive linear correlation 631 

permutations and combinations 241-268 
arrangements 246 
arranging by type 252-257 
arranging duplicates 254 
arranging objects in circle 247—248 
combinations 260—263, 293 
examining combinations 260—263 
examining permutations 258—259 
factorial 246 

formulas for arrangements 248 
permutations versus combinations 261 
three-horse race 243—246 

pie charts 8-9, 9, 23 
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placebo 646 

point estimators 443—447, 452, 493, 519 
and sampling distributions 485 
for population variance 457 
problem with 489 

Poisson distribution 306—319, 324, 386, 406, 407, 412 
approximating binomial distribution 398 
approximating the binomial distribution 316—317 
central limit theorem 482 
expectation and variance 308 
guide 319 
when X is large 407 
when X is small 407 
X + Y 312-313 

Poisson Distribution Up Close 307 
Poisson variables, combining 313 
Pool Puzzle 

binomial distribution 299—300 
confidence intervals 499-500 
continuity correction 399—400 
discrete probability distributions 215-216 

population 418, 438 
chart 419 
mean 446 

proportion 454—455, 457 
variance 448—450 
versus samples 418 

(see also estimating populations and samples) 
positive and negative distances 105 
positive linear correlation 613, 631, 640 
possibility space 135 
precision, problem with 489 


how probability relates to roulette 132 
intersection 149—154, 153 
Law of Total Probability 172,178 
measuring 132 

of getting a black or even 145—146 
proportion 455 
range of values 329 
union 149—154, 153 
Venn diagram 136, 154 

written as fractions, decimals, or percentages 139 

probability density 334 
function 330—337 
never equaling 0 341 

probability distributions 220, 224, 363 
4X 376 

binomial (see binomial distribution) 

continuous data 329—333 

geometric (see geometric distribution) 

large number of possibilities 273, 277 

letters p and q 284 

new price and payouts 212—214 

normal (see normal distribution) 

of X + Y 372 

patterns 274—277 

Poisson (see Poisson distribution) 

random variable X 210 

standard deviation 207 

Probability Distributions Up Close 202 

probability tables 349-352, 352, 370, 513, 657-661 
standard normal probabilities 658-659 
t-distribution critical values 660 
X2 (chi square) critical values 661 

Probability Tables Up Close 351 


probability 127—196 
一 area 331 
adding 142, 143 
Bayes’Theorem 173,178 
best method 143 
conditional 157-160 

probability tree 158-161 
events (see events) 
for a sample 459 


probability trees 158-161, 165, 180 
hints 161 
proportions 9 

probability 455 
sampling distribution of 460 
distribution of P 464—466 

s 

expectation of P s 462 
variance of P 463 

S 

standard error of 463 
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qualitative data 18 
quartiles 92 

interquartile range 92—93 
lower 92, 94 
middle 92 
upper 92, 94 

questionnaires, bias 426 

R 

randomization 646 

randomized block design (experiments) 647 
random number generators 431 

random variables 202 
adding 230 
continuous 331 
independent 233 
subtracting 231 

range 86-103, 97, 329, 333 
basketball scores 88 
calculating 86 
lower bound 86 
outliers 89—91 
problems with 90 
quartiles 92 
upper bound 86 

regression (see correlation and regression) 
replication 646 
response variable 608 
right-skewed data 62, 64 

roulette 129—196 

black and even pockets 156 
board 129—130 
how probability relates to 132 
independent events 184 
measuring probability 132 
P(Black I Even) 167—171 


P(Even) 169 
possibility space 135 
probabilities 135 

probability of ball landing on 7 133-134 
sample space 135 



samples 418, 438 
biased 424-426 
designing 422-423 
mean 445, 446 
space 135 
survey 418, 438 
unbiased 424-426 
unreliability 420 
variance 449, 452 

(see also estimating populations and samples) 
sampling (see statistical sampling) 

sampling distribution 466 

difference between two means 652 
difference between two proportions 653 

sampling distribution of means 471-479 
distribution of x 480 
variance of X 476 

sampling distribution of proportion 460 
distribution of P 464—466 

S 

expectation of P s 462 
variance of P 463 

s 

Sampling Distribution of Proportions Up Close 469 
Sampling Distribution of the Means Up Close 479 

sampling frame 423—428, 438 
bias 425 

sampling units 422, 428 
bias 425 

sampling without replacement 430 
sampling with replacement 430 
scales 12 
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scatter diagrams 609, 612, 616, 618, 640 
line of best fit 618 

finding equation 622 

finding slope 623—624 

sum of squared errors 620—621 

scatter plots (see scatter diagrams) 

segmented bar chart 14 

set theory 139 

shortcuts 370 

sigma (2) 49 

sigma (a) 107 

significance level 533, 538, 539 
significance tests 552 

simple random sampling 430—431, 434, 436, 438 
drawing lots 431 
random number generators 431 

skewed data 58—59, 64 
mean 62 

Skewed Data Up Close 59 
skewed to the left 59 
skewed to the right 58—59 

slope of regression line 

confidence intervals 651 
slot machines 198 

discrete random variables 202 
low versus high variance 208 
probability distributions 201 
variance 207 

split-category bar chart 14 

standard deviation 107—1 10 ， 113—117, 207, 220 
from the mean 121 
variance equations 113 
a (sigma) 107, 224 

Standard Deviation Exposed 108 

standard error 485 
of mean 479 
of proportion 463, 466 


standardizing normal variables 344 

standard normal probabilities 658-659 

standard scores 118-122, 345-347, 352 
calculating 119 
interpreting 120 

Standard Scores Up Close 121 

statistical sampling 415-440 

bias in sampling 423-426, 434, 438 
sources 425 
choosing samples 430 
cluster sampling 433, 433-434, 436, 438 
defined 418 
designing samples 422 
drawing lots 431， 434 
how it works 419 
incorrect sampling unit 425 
increasing sample size 434 
population 418, 438 
population chart 419 
populations versus samples 418 
random number generators 431 
representative sample 420 
samples 438 

unreliability 420 
sample survey 418, 438 
sampling bias 434 
sampling chart 419 
sampling frame 423—428, 438 
sampling units 422, 428 
sampling without replacement 430 
sampling with replacement 430 
simple random sampling 430—438 
choosing 431 
strata 432 

stratified sampling 432, 434, 436, 438 
systematic sampling 433—434, 438 
target population 422, 428, 438 
unreliability 420 

statistics 

defined 2 
why learn 3 
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statistics tables 657-661 

standard normal probabilities 658-659 
t-distribution critical values 660 
X2 (chi square) critical values 661 

stemplots 644 
stems 644 
strata 432 

stratified sampling 432-438 
stratified sampling 436 
summation symbol (!]) 49 
sum of squared errors 640 
symmetric data 59 
systematic sampling 433-434, 438 

T 

t-distributions 509-515 
probability tables 513 
shortcuts 515 
small sample 510 
standard score 511 
table 660 

versus normal distributions 515 
target population 422, 426, 428, 438 
test statistic 531, 539, 544, 547 
three-horse race 243—246 
two-tailed tests 534, 539 
Type I error 555-560, 566 
Type II error 555—560, 566 

TJ 

unbiased sample 424-425 
uniform distribution 655 
union 149—154 
univariate data 608, 640 


upper bounds 86, 97 
basketball scores 88 

upper quartile 92 
finding 94 

V 

variability 104—124 
average distance 105 
positive and negative distances 105 
variance (see variance) 

variables 368 

probabilities involving the difference between two 369 

variance 106-113, 122, 205-208, 219, 220, 367 
binomial distribution 298, 389 
calculating 111—113 
quicker way 113 
geometric distribution 281—284 
independent observations 378 
linear transforms 233 
of X 476 

Poisson distribution 308 
slot machines 207 
standard deviation 107—110 
a (sigma) 107 
two games 222—224 
X + Y 368 

Variance Up Close 450 

Venn diagrams 136, 139, 154, 165 
conditional probability 157 
independent events 184 

vertical bar charts 10-11, 23 

visualizing information 1—44, 19—28 

categorical and numerical data 18-23 
cumulative frequency 34—38 
histograms 19—28 
statistics 2 

(see also charts and graphs) 
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Vital Statistics 
A or B 153 

approximating binomial distribution 389 

approximating Poisson distribution 407 

arranging by type 254 

Bayes’Theorem 178 

combinations 263 

conditions 165 

cumulative frequency 34 

event 132 

formulas for arrangements 248 

frequency 8 

independence 184 

independent observations 224 

interquartile range 93 

Law of Total Probability 178 

linear transforms 220 

mean 54 

mode 76 

outlier 58 

percentile 99 

permutations 263 

probability 143 

quartiles 92 

range 86 

significance level 533 
skewed data 58 
standard score 346 
uniform distribution 655 
variance 106， 113 


¥ 

Watch it! 

criteria of np > 10 and nq > 10 389 
cumulative frequencies 35 
exclusive versus exhaustive 150 
how large n needs to be 465 
independent random variables 230—232 
independent versus mutually exclusive 183 
linear regression 626 
percentages with no frequencies 12 
quartiles 92 
samples equation 451 
subtracting random variables 231 
+X 2 and 2X 223 

Who Wants To Win A Swivel Chair 289, 381-386 
expectation and variance 304 
generalizing probability for three questions 293 
generalizing the probability 296 

probability of getting exactly three questions right 304 
probability of getting exactly two questions right 304 
probability of getting no questions right 304 
probability of getting two or three questions right 304 
should you play or walk away 291 

width of data 88 

X 

X + Y Distribution Up Close 368 

X - Y Distribution Up Close 369 

z 

z-scores 118-122 
calculating 119 
interpreting 120 
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